Supported FBNet architecture. (#463)

* Supported any feature map size for average pool. * Different models may have different feature map size. * Used registry to register keypoint and mask heads. * Passing in/out channels between modules when creating the model. Passing in/out channels between modules when creating the model. This simplifies the code to compute the input channels for feature extractors and makes the predictors independent of the backbone architectures. * Passed in_channels to rpn and head builders. * Set out_channels to model modules including backbone and feature extractors. * Moved cfg.MODEL.BACKBONE.OUT_CHANNELS to cfg.MODEL.RESNETS.BACKBONE_OUT_CHANNELS as it is not used by all architectures. Updated config files accordingly. For new architecture modules, the return module needs to contain a field called 'out_channels' to indicate the output channel size. * Added unit test for box_coder and nms. * Added FBNet architecture. * FBNet is a general architecture definition to support efficient architecture search and MaskRCNN2GO. * Included various efficient building blocks (inverted residual, shuffle, separate dw conv, dw upsampling etc.) * Supported building backbone, rpn, detection, keypoint and mask heads using efficient building blocks. * Architecture could be defined in `fbnet_modeldef.py` or in `cfg.MODEL.FBNET.ARCH_DEF` directly. * A few baseline architectures are included. * Added various unit tests. * build and run backbones. * build and run feature extractors. * build and run predictors. * Added a unit test to verify all config files are loadable.

Supported FBNet architecture. (#463)
* Supported any feature map size for average pool. * Different models may have different feature map size. * Used registry to register keypoint and mask heads. * Passing in/out channels between modules when creating the model. Passing in/out channels between modules when creating the model. This simplifies the code to compute the input channels for feature extractors and makes the predictors independent of the backbone architectures. * Passed in_channels to rpn and head builders. * Set out_channels to model modules including backbone and feature extractors. * Moved cfg.MODEL.BACKBONE.OUT_CHANNELS to cfg.MODEL.RESNETS.BACKBONE_OUT_CHANNELS as it is not used by all architectures. Updated config files accordingly. For new architecture modules, the return module needs to contain a field called 'out_channels' to indicate the output channel size. * Added unit test for box_coder and nms. * Added FBNet architecture. * FBNet is a general architecture definition to support efficient architecture search and MaskRCNN2GO. * Included various efficient building blocks (inverted residual, shuffle, separate dw conv, dw upsampling etc.) * Supported building backbone, rpn, detection, keypoint and mask heads using efficient building blocks. * Architecture could be defined in `fbnet_modeldef.py` or in `cfg.MODEL.FBNET.ARCH_DEF` directly. * A few baseline architectures are included. * Added various unit tests. * build and run backbones. * build and run feature extractors. * build and run predictors. * Added a unit test to verify all config files are loadable.
b23eee0c · Stzpz · Francisco Massa · 192261db · b23eee0c · b23eee0c
Commit b23eee0c authored Feb 20, 2019 by Stzpz Committed by Francisco Massa Feb 20, 2019
79 changed files
--- a/configs/caffe2/e2e_faster_rcnn_R_101_FPN_1x_caffe2.yaml
+++ b/configs/caffe2/e2e_faster_rcnn_R_101_FPN_1x_caffe2.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://Caffe2Detectron/COCO/35857890/e2e_faster_rcnn_R-101-FPN_1x"
  BACKBONE:
    CONV_BODY: "R-101-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/caffe2/e2e_faster_rcnn_R_50_FPN_1x_caffe2.yaml
+++ b/configs/caffe2/e2e_faster_rcnn_R_50_FPN_1x_caffe2.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://Caffe2Detectron/COCO/35857345/e2e_faster_rcnn_R-50-FPN_1x"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/caffe2/e2e_faster_rcnn_X_101_32x8d_FPN_1x_caffe2.yaml
+++ b/configs/caffe2/e2e_faster_rcnn_X_101_32x8d_FPN_1x_caffe2.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://Caffe2Detectron/COCO/36761737/e2e_faster_rcnn_X-101-32x8d-FPN_1x"
  BACKBONE:
    CONV_BODY: "R-101-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/caffe2/e2e_keypoint_rcnn_R_50_FPN_1x_caffe2.yaml
+++ b/configs/caffe2/e2e_keypoint_rcnn_R_50_FPN_1x_caffe2.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://Caffe2Detectron/COCO/37697547/e2e_keypoint_rcnn_R-50-FPN_1x"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/caffe2/e2e_mask_rcnn_R_101_FPN_1x_caffe2.yaml
+++ b/configs/caffe2/e2e_mask_rcnn_R_101_FPN_1x_caffe2.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://Caffe2Detectron/COCO/35861795/e2e_mask_rcnn_R-101-FPN_1x"
  BACKBONE:
    CONV_BODY: "R-101-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/caffe2/e2e_mask_rcnn_R_50_FPN_1x_caffe2.yaml
+++ b/configs/caffe2/e2e_mask_rcnn_R_50_FPN_1x_caffe2.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://Caffe2Detectron/COCO/35858933/e2e_mask_rcnn_R-50-FPN_1x"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/caffe2/e2e_mask_rcnn_X-152-32x8d-FPN-IN5k_1.44x_caffe2.yaml
+++ b/configs/caffe2/e2e_mask_rcnn_X-152-32x8d-FPN-IN5k_1.44x_caffe2.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://Caffe2Detectron/COCO/37129812/e2e_mask_rcnn_X-152-32x8d-FPN-IN5k_1.44x"
  BACKBONE:
    CONV_BODY: "R-152-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/caffe2/e2e_mask_rcnn_X_101_32x8d_FPN_1x_caffe2.yaml
+++ b/configs/caffe2/e2e_mask_rcnn_X_101_32x8d_FPN_1x_caffe2.yaml
@@ -3,7 +3,11 @@ MODEL:
  WEIGHT: "catalog://Caffe2Detectron/COCO/36761843/e2e_mask_rcnn_X-101-32x8d-FPN_1x"
  BACKBONE:
    CONV_BODY: "R-101-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
+    STRIDE_IN_1X1: False
+    NUM_GROUPS: 32
+    WIDTH_PER_GROUP: 8
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)
@@ -27,10 +31,6 @@ MODEL:
    POOLER_SAMPLING_RATIO: 2
    RESOLUTION: 28
    SHARE_BOX_FEATURE_EXTRACTOR: False
-  RESNETS:
-    STRIDE_IN_1X1: False
-    NUM_GROUPS: 32
-    WIDTH_PER_GROUP: 8
  MASK_ON: True
 DATASETS:
  TEST: ("coco_2014_minival",)

--- a/configs/cityscapes/e2e_faster_rcnn_R_50_FPN_1x_cocostyle.yaml
+++ b/configs/cityscapes/e2e_faster_rcnn_R_50_FPN_1x_cocostyle.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/cityscapes/e2e_mask_rcnn_R_50_FPN_1x_cocostyle.yaml
+++ b/configs/cityscapes/e2e_mask_rcnn_R_50_FPN_1x_cocostyle.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/e2e_faster_rcnn_R_101_FPN_1x.yaml
+++ b/configs/e2e_faster_rcnn_R_101_FPN_1x.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-101"
  BACKBONE:
    CONV_BODY: "R-101-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/e2e_faster_rcnn_R_50_FPN_1x.yaml
+++ b/configs/e2e_faster_rcnn_R_50_FPN_1x.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/e2e_faster_rcnn_X_101_32x8d_FPN_1x.yaml
+++ b/configs/e2e_faster_rcnn_X_101_32x8d_FPN_1x.yaml
@@ -3,7 +3,6 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/FAIR/20171220/X-101-32x8d"
  BACKBONE:
    CONV_BODY: "R-101-FPN"
-    OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)
@@ -20,6 +19,7 @@ MODEL:
    FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
    PREDICTOR: "FPNPredictor"
  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
    STRIDE_IN_1X1: False
    NUM_GROUPS: 32
    WIDTH_PER_GROUP: 8

--- a/configs/e2e_faster_rcnn_fbnet.yaml
+++ b/configs/e2e_faster_rcnn_fbnet.yaml
+MODEL:
+  META_ARCHITECTURE: "GeneralizedRCNN"
+  BACKBONE:
+    CONV_BODY: FBNet
+  FBNET:
+    ARCH: "default"
+    BN_TYPE: "bn"
+    WIDTH_DIVISOR: 8
+    DW_CONV_SKIP_BN: True
+    DW_CONV_SKIP_RELU: True
+  RPN:
+    ANCHOR_SIZES: (16, 32, 64, 128, 256)
+    ANCHOR_STRIDE: (16, )
+    BATCH_SIZE_PER_IMAGE: 256
+    PRE_NMS_TOP_N_TRAIN: 6000
+    PRE_NMS_TOP_N_TEST: 6000
+    POST_NMS_TOP_N_TRAIN: 2000
+    POST_NMS_TOP_N_TEST: 1000
+    RPN_HEAD: FBNet.rpn_head
+  ROI_HEADS:
+    BATCH_SIZE_PER_IMAGE: 512
+  ROI_BOX_HEAD:
+    POOLER_RESOLUTION: 6
+    FEATURE_EXTRACTOR: FBNet.roi_head
+    NUM_CLASSES: 81
+DATASETS:
+  TRAIN: ("coco_2014_train", "coco_2014_valminusminival")
+  TEST: ("coco_2014_minival",)
+SOLVER:
+  BASE_LR: 0.06
+  WARMUP_FACTOR: 0.1
+  WEIGHT_DECAY: 0.0001
+  STEPS: (60000, 80000)
+  MAX_ITER: 90000
+  IMS_PER_BATCH: 128  # for 8GPUs
+# TEST:
+#   IMS_PER_BATCH: 8
+INPUT:
+  MIN_SIZE_TRAIN: (320, )
+  MAX_SIZE_TRAIN: 640
+  MIN_SIZE_TEST: 320
+  MAX_SIZE_TEST: 640
+  PIXEL_MEAN: [103.53, 116.28, 123.675]
+  PIXEL_STD: [57.375, 57.12, 58.395]
--- a/configs/e2e_faster_rcnn_fbnet_600.yaml
+++ b/configs/e2e_faster_rcnn_fbnet_600.yaml
+MODEL:
+  META_ARCHITECTURE: "GeneralizedRCNN"
+  BACKBONE:
+    CONV_BODY: FBNet
+  FBNET:
+    ARCH: "default"
+    BN_TYPE: "bn"
+    WIDTH_DIVISOR: 8
+    DW_CONV_SKIP_BN: True
+    DW_CONV_SKIP_RELU: True
+  RPN:
+    ANCHOR_SIZES: (32, 64, 128, 256, 512)
+    ANCHOR_STRIDE: (16, )
+    BATCH_SIZE_PER_IMAGE: 256
+    PRE_NMS_TOP_N_TRAIN: 6000
+    PRE_NMS_TOP_N_TEST: 6000
+    POST_NMS_TOP_N_TRAIN: 2000
+    POST_NMS_TOP_N_TEST: 1000
+    RPN_HEAD: FBNet.rpn_head
+  ROI_HEADS:
+    BATCH_SIZE_PER_IMAGE: 256
+  ROI_BOX_HEAD:
+    POOLER_RESOLUTION: 6
+    FEATURE_EXTRACTOR: FBNet.roi_head
+    NUM_CLASSES: 81
+DATASETS:
+  TRAIN: ("coco_2014_train", "coco_2014_valminusminival")
+  TEST: ("coco_2014_minival",)
+SOLVER:
+  BASE_LR: 0.06
+  WARMUP_FACTOR: 0.1
+  WEIGHT_DECAY: 0.0001
+  STEPS: (60000, 80000)
+  MAX_ITER: 90000
+  IMS_PER_BATCH: 128  # for 8GPUs
+# TEST:
+#   IMS_PER_BATCH: 8
+INPUT:
+  MIN_SIZE_TRAIN: (600, )
+  MAX_SIZE_TRAIN: 1000
+  MIN_SIZE_TEST: 600
+  MAX_SIZE_TEST: 1000
+  PIXEL_MEAN: [103.53, 116.28, 123.675]
+  PIXEL_STD: [57.375, 57.12, 58.395]
--- a/configs/e2e_keypoint_rcnn_R_50_FPN_1x.yaml
+++ b/configs/e2e_keypoint_rcnn_R_50_FPN_1x.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/e2e_mask_rcnn_R_101_FPN_1x.yaml
+++ b/configs/e2e_mask_rcnn_R_101_FPN_1x.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-101"
  BACKBONE:
    CONV_BODY: "R-101-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/e2e_mask_rcnn_R_50_FPN_1x.yaml
+++ b/configs/e2e_mask_rcnn_R_50_FPN_1x.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/e2e_mask_rcnn_X_101_32x8d_FPN_1x.yaml
+++ b/configs/e2e_mask_rcnn_X_101_32x8d_FPN_1x.yaml
@@ -3,7 +3,11 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/FAIR/20171220/X-101-32x8d"
  BACKBONE:
    CONV_BODY: "R-101-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
+    STRIDE_IN_1X1: False
+    NUM_GROUPS: 32
+    WIDTH_PER_GROUP: 8
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)
@@ -27,10 +31,6 @@ MODEL:
    POOLER_SAMPLING_RATIO: 2
    RESOLUTION: 28
    SHARE_BOX_FEATURE_EXTRACTOR: False
-  RESNETS:
-    STRIDE_IN_1X1: False
-    NUM_GROUPS: 32
-    WIDTH_PER_GROUP: 8
  MASK_ON: True
 DATASETS:
  TRAIN: ("coco_2014_train", "coco_2014_valminusminival")

--- a/configs/e2e_mask_rcnn_fbnet.yaml
+++ b/configs/e2e_mask_rcnn_fbnet.yaml
+MODEL:
+  META_ARCHITECTURE: "GeneralizedRCNN"
+  BACKBONE:
+    CONV_BODY: FBNet
+  FBNET:
+    ARCH: "default"
+    BN_TYPE: "bn"
+    WIDTH_DIVISOR: 8
+    DW_CONV_SKIP_BN: True
+    DW_CONV_SKIP_RELU: True
+    DET_HEAD_LAST_SCALE: -1.0
+  RPN:
+    ANCHOR_SIZES: (16, 32, 64, 128, 256)
+    ANCHOR_STRIDE: (16, )
+    BATCH_SIZE_PER_IMAGE: 256
+    PRE_NMS_TOP_N_TRAIN: 6000
+    PRE_NMS_TOP_N_TEST: 6000
+    POST_NMS_TOP_N_TRAIN: 2000
+    POST_NMS_TOP_N_TEST: 1000
+    RPN_HEAD: FBNet.rpn_head
+  ROI_HEADS:
+    BATCH_SIZE_PER_IMAGE: 256
+  ROI_BOX_HEAD:
+    POOLER_RESOLUTION: 6
+    FEATURE_EXTRACTOR: FBNet.roi_head
+    NUM_CLASSES: 81
+  ROI_MASK_HEAD:
+    POOLER_RESOLUTION: 6
+    FEATURE_EXTRACTOR: FBNet.roi_head_mask
+    PREDICTOR: "MaskRCNNConv1x1Predictor"
+    RESOLUTION: 12
+    SHARE_BOX_FEATURE_EXTRACTOR: False
+  MASK_ON: True
+DATASETS:
+  TRAIN: ("coco_2014_train", "coco_2014_valminusminival")
+  TEST: ("coco_2014_minival",)
+SOLVER:
+  BASE_LR: 0.06
+  WARMUP_FACTOR: 0.1
+  WEIGHT_DECAY: 0.0001
+  STEPS: (60000, 80000)
+  MAX_ITER: 90000
+  IMS_PER_BATCH: 128  # for 8GPUs
+# TEST:
+#   IMS_PER_BATCH: 8
+INPUT:
+  MIN_SIZE_TRAIN: (320, )
+  MAX_SIZE_TRAIN: 640
+  MIN_SIZE_TEST: 320
+  MAX_SIZE_TEST: 640
+  PIXEL_MEAN: [103.53, 116.28, 123.675]
+  PIXEL_STD: [57.375, 57.12, 58.395]
--- a/configs/e2e_mask_rcnn_fbnet_xirb16d_dsmask.yaml
+++ b/configs/e2e_mask_rcnn_fbnet_xirb16d_dsmask.yaml
+MODEL:
+  META_ARCHITECTURE: "GeneralizedRCNN"
+  BACKBONE:
+    CONV_BODY: FBNet
+  FBNET:
+    ARCH: "xirb16d_dsmask"
+    BN_TYPE: "bn"
+    WIDTH_DIVISOR: 8
+    DW_CONV_SKIP_BN: True
+    DW_CONV_SKIP_RELU: True
+    DET_HEAD_LAST_SCALE: -1.0
+  RPN:
+    ANCHOR_SIZES: (16, 32, 64, 128, 256)
+    ANCHOR_STRIDE: (16, )
+    BATCH_SIZE_PER_IMAGE: 256
+    PRE_NMS_TOP_N_TRAIN: 6000
+    PRE_NMS_TOP_N_TEST: 6000
+    POST_NMS_TOP_N_TRAIN: 2000
+    POST_NMS_TOP_N_TEST: 1000
+    RPN_HEAD: FBNet.rpn_head
+  ROI_HEADS:
+    BATCH_SIZE_PER_IMAGE: 512
+  ROI_BOX_HEAD:
+    POOLER_RESOLUTION: 6
+    FEATURE_EXTRACTOR: FBNet.roi_head
+    NUM_CLASSES: 81
+  ROI_MASK_HEAD:
+    POOLER_RESOLUTION: 6
+    FEATURE_EXTRACTOR: FBNet.roi_head_mask
+    PREDICTOR: "MaskRCNNConv1x1Predictor"
+    RESOLUTION: 12
+    SHARE_BOX_FEATURE_EXTRACTOR: False
+  MASK_ON: True
+DATASETS:
+  TRAIN: ("coco_2014_train", "coco_2014_valminusminival")
+  TEST: ("coco_2014_minival",)
+SOLVER:
+  BASE_LR: 0.06
+  WARMUP_FACTOR: 0.1
+  WEIGHT_DECAY: 0.0001
+  STEPS: (60000, 80000)
+  MAX_ITER: 90000
+  IMS_PER_BATCH: 128  # for 8GPUs
+# TEST:
+#   IMS_PER_BATCH: 8
+INPUT:
+  MIN_SIZE_TRAIN: (320, )
+  MAX_SIZE_TRAIN: 640
+  MIN_SIZE_TEST: 320
+  MAX_SIZE_TEST: 640
+  PIXEL_MEAN: [103.53, 116.28, 123.675]
+  PIXEL_STD: [57.375, 57.12, 58.395]
--- a/configs/gn_baselines/e2e_faster_rcnn_R_50_FPN_1x_gn.yaml
+++ b/configs/gn_baselines/e2e_faster_rcnn_R_50_FPN_1x_gn.yaml
 INPUT:
-  MIN_SIZE_TRAIN: 800
+  MIN_SIZE_TRAIN: (800,)
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_TEST: 800
  MAX_SIZE_TEST: 1333
@@ -8,8 +8,8 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50-GN"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
  RESNETS: # use GN for backbone
+    BACKBONE_OUT_CHANNELS: 256
    STRIDE_IN_1X1: False
    TRANS_FUNC: "BottleneckWithGN"
    STEM_FUNC: "StemWithGN"

--- a/configs/gn_baselines/e2e_faster_rcnn_R_50_FPN_Xconv1fc_1x_gn.yaml
+++ b/configs/gn_baselines/e2e_faster_rcnn_R_50_FPN_Xconv1fc_1x_gn.yaml
 INPUT:
-  MIN_SIZE_TRAIN: 800
+  MIN_SIZE_TRAIN: (800,)
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_TEST: 800
  MAX_SIZE_TEST: 1333
@@ -8,8 +8,8 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50-GN"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
  RESNETS: # use GN for backbone
+    BACKBONE_OUT_CHANNELS: 256
    STRIDE_IN_1X1: False
    TRANS_FUNC: "BottleneckWithGN"
    STEM_FUNC: "StemWithGN"

--- a/configs/gn_baselines/e2e_mask_rcnn_R_50_FPN_1x_gn.yaml
+++ b/configs/gn_baselines/e2e_mask_rcnn_R_50_FPN_1x_gn.yaml
 INPUT:
-  MIN_SIZE_TRAIN: 800
+  MIN_SIZE_TRAIN: (800,)
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_TEST: 800
  MAX_SIZE_TEST: 1333
@@ -8,8 +8,8 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50-GN"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
  RESNETS: # use GN for backbone
+    BACKBONE_OUT_CHANNELS: 256
    STRIDE_IN_1X1: False
    TRANS_FUNC: "BottleneckWithGN"
    STEM_FUNC: "StemWithGN"

--- a/configs/gn_baselines/e2e_mask_rcnn_R_50_FPN_Xconv1fc_1x_gn.yaml
+++ b/configs/gn_baselines/e2e_mask_rcnn_R_50_FPN_Xconv1fc_1x_gn.yaml
 INPUT:
-  MIN_SIZE_TRAIN: 800
+  MIN_SIZE_TRAIN: (800,)
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_TEST: 800
  MAX_SIZE_TEST: 1333
@@ -8,8 +8,8 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50-GN"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
  RESNETS: # use GN for backbone
+    BACKBONE_OUT_CHANNELS: 256
    STRIDE_IN_1X1: False
    TRANS_FUNC: "BottleneckWithGN"
    STEM_FUNC: "StemWithGN"

--- a/configs/gn_baselines/scratch_e2e_faster_rcnn_R_50_FPN_3x_gn.yaml
+++ b/configs/gn_baselines/scratch_e2e_faster_rcnn_R_50_FPN_3x_gn.yaml
 INPUT:
-  MIN_SIZE_TRAIN: 800
+  MIN_SIZE_TRAIN: (800,)
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_TEST: 800
  MAX_SIZE_TEST: 1333
@@ -8,9 +8,9 @@ MODEL:
  WEIGHT: "" # no pretrained model
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
    FREEZE_CONV_BODY_AT: 0 # finetune all layers
  RESNETS: # use GN for backbone
+    BACKBONE_OUT_CHANNELS: 256
    STRIDE_IN_1X1: False
    TRANS_FUNC: "BottleneckWithGN"
    STEM_FUNC: "StemWithGN"

--- a/configs/gn_baselines/scratch_e2e_faster_rcnn_R_50_FPN_Xconv1fc_3x_gn.yaml
+++ b/configs/gn_baselines/scratch_e2e_faster_rcnn_R_50_FPN_Xconv1fc_3x_gn.yaml
 INPUT:
-  MIN_SIZE_TRAIN: 800
+  MIN_SIZE_TRAIN: (800,)
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_TEST: 800
  MAX_SIZE_TEST: 1333
@@ -8,9 +8,9 @@ MODEL:
  WEIGHT: "" # no pretrained model
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
    FREEZE_CONV_BODY_AT: 0 # finetune all layers
  RESNETS: # use GN for backbone
+    BACKBONE_OUT_CHANNELS: 256
    STRIDE_IN_1X1: False
    TRANS_FUNC: "BottleneckWithGN"
    STEM_FUNC: "StemWithGN"

--- a/configs/gn_baselines/scratch_e2e_mask_rcnn_R_50_FPN_3x_gn.yaml
+++ b/configs/gn_baselines/scratch_e2e_mask_rcnn_R_50_FPN_3x_gn.yaml
 INPUT:
-  MIN_SIZE_TRAIN: 800
+  MIN_SIZE_TRAIN: (800,)
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_TEST: 800
  MAX_SIZE_TEST: 1333
@@ -8,9 +8,9 @@ MODEL:
  WEIGHT: "" # no pretrained model
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
    FREEZE_CONV_BODY_AT: 0 # finetune all layers
  RESNETS: # use GN for backbone
+    BACKBONE_OUT_CHANNELS: 256
    STRIDE_IN_1X1: False
    TRANS_FUNC: "BottleneckWithGN"
    STEM_FUNC: "StemWithGN"

--- a/configs/gn_baselines/scratch_e2e_mask_rcnn_R_50_FPN_Xconv1fc_3x_gn.yaml
+++ b/configs/gn_baselines/scratch_e2e_mask_rcnn_R_50_FPN_Xconv1fc_3x_gn.yaml
 INPUT:
-  MIN_SIZE_TRAIN: 800
+  MIN_SIZE_TRAIN: (800,)
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_TEST: 800
  MAX_SIZE_TEST: 1333
@@ -8,9 +8,9 @@ MODEL:
  WEIGHT: "" # no pretrained model
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
    FREEZE_CONV_BODY_AT: 0 # finetune all layers
  RESNETS: # use GN for backbone
+    BACKBONE_OUT_CHANNELS: 256
    STRIDE_IN_1X1: False
    TRANS_FUNC: "BottleneckWithGN"
    STEM_FUNC: "StemWithGN"

--- a/configs/pascal_voc/e2e_mask_rcnn_R_50_FPN_1x_cocostyle.yaml
+++ b/configs/pascal_voc/e2e_mask_rcnn_R_50_FPN_1x_cocostyle.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/quick_schedules/e2e_faster_rcnn_R_50_FPN_quick.yaml
+++ b/configs/quick_schedules/e2e_faster_rcnn_R_50_FPN_quick.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/quick_schedules/e2e_faster_rcnn_X_101_32x8d_FPN_quick.yaml
+++ b/configs/quick_schedules/e2e_faster_rcnn_X_101_32x8d_FPN_quick.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/FAIR/20171220/X-101-32x8d"
  BACKBONE:
    CONV_BODY: "R-101-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/quick_schedules/e2e_keypoint_rcnn_R_50_FPN_quick.yaml
+++ b/configs/quick_schedules/e2e_keypoint_rcnn_R_50_FPN_quick.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/quick_schedules/e2e_mask_rcnn_R_50_FPN_quick.yaml
+++ b/configs/quick_schedules/e2e_mask_rcnn_R_50_FPN_quick.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/quick_schedules/e2e_mask_rcnn_X_101_32x8d_FPN_quick.yaml
+++ b/configs/quick_schedules/e2e_mask_rcnn_X_101_32x8d_FPN_quick.yaml
@@ -3,7 +3,8 @@ MODEL:
  WEIGHT: "catalog://ImageNetPretrained/FAIR/20171220/X-101-32x8d"
  BACKBONE:
    CONV_BODY: "R-101-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/quick_schedules/rpn_R_50_FPN_quick.yaml
+++ b/configs/quick_schedules/rpn_R_50_FPN_quick.yaml
@@ -4,7 +4,8 @@ MODEL:
  RPN_ONLY: True
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/retinanet/retinanet_R-101-FPN_1x.yaml
+++ b/configs/retinanet/retinanet_R-101-FPN_1x.yaml
@@ -5,7 +5,8 @@ MODEL:
  RETINANET_ON: True
  BACKBONE:
    CONV_BODY: "R-101-FPN-RETINANET"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    FG_IOU_THRESHOLD: 0.5

--- a/configs/retinanet/retinanet_R-101-FPN_P5_1x.yaml
+++ b/configs/retinanet/retinanet_R-101-FPN_P5_1x.yaml
@@ -5,7 +5,8 @@ MODEL:
  RETINANET_ON: True
  BACKBONE:
    CONV_BODY: "R-101-FPN-RETINANET"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    FG_IOU_THRESHOLD: 0.5

--- a/configs/retinanet/retinanet_R-50-FPN_1x.yaml
+++ b/configs/retinanet/retinanet_R-50-FPN_1x.yaml
@@ -5,7 +5,8 @@ MODEL:
  RETINANET_ON: True
  BACKBONE:
    CONV_BODY: "R-50-FPN-RETINANET"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    FG_IOU_THRESHOLD: 0.5

--- a/configs/retinanet/retinanet_R-50-FPN_1x_quick.yaml
+++ b/configs/retinanet/retinanet_R-50-FPN_1x_quick.yaml
@@ -5,7 +5,8 @@ MODEL:
  RETINANET_ON: True
  BACKBONE:
    CONV_BODY: "R-50-FPN-RETINANET"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    FG_IOU_THRESHOLD: 0.5

--- a/configs/retinanet/retinanet_R-50-FPN_P5_1x.yaml
+++ b/configs/retinanet/retinanet_R-50-FPN_P5_1x.yaml
@@ -5,7 +5,8 @@ MODEL:
  RETINANET_ON: True
  BACKBONE:
    CONV_BODY: "R-50-FPN-RETINANET"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    FG_IOU_THRESHOLD: 0.5

--- a/configs/retinanet/retinanet_X_101_32x8d_FPN_1x.yaml
+++ b/configs/retinanet/retinanet_X_101_32x8d_FPN_1x.yaml
@@ -5,7 +5,8 @@ MODEL:
  RETINANET_ON: True
  BACKBONE:
    CONV_BODY: "R-101-FPN-RETINANET"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    FG_IOU_THRESHOLD: 0.5

--- a/configs/rpn_R_101_FPN_1x.yaml
+++ b/configs/rpn_R_101_FPN_1x.yaml
@@ -4,7 +4,8 @@ MODEL:
  RPN_ONLY: True
  BACKBONE:
    CONV_BODY: "R-101-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/rpn_R_50_FPN_1x.yaml
+++ b/configs/rpn_R_50_FPN_1x.yaml
@@ -4,7 +4,8 @@ MODEL:
  RPN_ONLY: True
  BACKBONE:
    CONV_BODY: "R-50-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/configs/rpn_X_101_32x8d_FPN_1x.yaml
+++ b/configs/rpn_X_101_32x8d_FPN_1x.yaml
@@ -4,7 +4,8 @@ MODEL:
  RPN_ONLY: True
  BACKBONE:
    CONV_BODY: "R-101-FPN"
-    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)

--- a/maskrcnn_benchmark/config/defaults.py
+++ b/maskrcnn_benchmark/config/defaults.py
@@ -92,7 +92,6 @@ _C.MODEL.BACKBONE.CONV_BODY = "R-50-C4"

 # Add StopGrad at a specified stage so the bottom layers are frozen
 _C.MODEL.BACKBONE.FREEZE_CONV_BODY_AT = 2
-_C.MODEL.BACKBONE.OUT_CHANNELS = 256 * 4
 # GN for backbone
 _C.MODEL.BACKBONE.USE_GN = False

@@ -271,6 +270,7 @@ _C.MODEL.RESNETS.STEM_FUNC = "StemWithFixedBatchNorm"
 # Apply dilation in stage "res5"
 _C.MODEL.RESNETS.RES5_DILATION = 1

+_C.MODEL.RESNETS.BACKBONE_OUT_CHANNELS = 256 * 4
 _C.MODEL.RESNETS.RES2_OUT_CHANNELS = 256
 _C.MODEL.RESNETS.STEM_OUT_CHANNELS = 64

@@ -335,6 +335,44 @@ _C.MODEL.RETINANET.INFERENCE_TH = 0.05
 # NMS threshold used in RetinaNet
 _C.MODEL.RETINANET.NMS_TH = 0.4

+
+# ---------------------------------------------------------------------------- #
+# FBNet options
+# ---------------------------------------------------------------------------- #
+_C.MODEL.FBNET = CN()
+_C.MODEL.FBNET.ARCH = "default"
+# custom arch
+_C.MODEL.FBNET.ARCH_DEF = ""
+_C.MODEL.FBNET.BN_TYPE = "bn"
+_C.MODEL.FBNET.SCALE_FACTOR = 1.0
+# the output channels will be divisible by WIDTH_DIVISOR
+_C.MODEL.FBNET.WIDTH_DIVISOR = 1
+_C.MODEL.FBNET.DW_CONV_SKIP_BN = True
+_C.MODEL.FBNET.DW_CONV_SKIP_RELU = True
+
+# > 0 scale, == 0 skip, < 0 same dimension
+_C.MODEL.FBNET.DET_HEAD_LAST_SCALE = 1.0
+_C.MODEL.FBNET.DET_HEAD_BLOCKS = []
+# overwrite the stride for the head, 0 to use original value
+_C.MODEL.FBNET.DET_HEAD_STRIDE = 0
+
+# > 0 scale, == 0 skip, < 0 same dimension
+_C.MODEL.FBNET.KPTS_HEAD_LAST_SCALE = 0.0
+_C.MODEL.FBNET.KPTS_HEAD_BLOCKS = []
+# overwrite the stride for the head, 0 to use original value
+_C.MODEL.FBNET.KPTS_HEAD_STRIDE = 0
+
+# > 0 scale, == 0 skip, < 0 same dimension
+_C.MODEL.FBNET.MASK_HEAD_LAST_SCALE = 0.0
+_C.MODEL.FBNET.MASK_HEAD_BLOCKS = []
+# overwrite the stride for the head, 0 to use original value
+_C.MODEL.FBNET.MASK_HEAD_STRIDE = 0
+
+# 0 to use all blocks defined in arch_def
+_C.MODEL.FBNET.RPN_HEAD_BLOCKS = 0
+_C.MODEL.FBNET.RPN_BN_TYPE = ""
+
+
 # ---------------------------------------------------------------------------- #
 # Solver
 # ---------------------------------------------------------------------------- #

--- a/maskrcnn_benchmark/layers/__init__.py
+++ b/maskrcnn_benchmark/layers/__init__.py
@@ -4,6 +4,7 @@ import torch
 from .batch_norm import FrozenBatchNorm2d
 from .misc import Conv2d
 from .misc import ConvTranspose2d
+from .misc import BatchNorm2d
 from .misc import interpolate
 from .nms import nms
 from .roi_align import ROIAlign
@@ -15,6 +16,6 @@ from .sigmoid_focal_loss import SigmoidFocalLoss

 __all__ = ["nms", "roi_align", "ROIAlign", "roi_pool", "ROIPool",
           "smooth_l1_loss", "Conv2d", "ConvTranspose2d", "interpolate",
-           "FrozenBatchNorm2d", "SigmoidFocalLoss"
+           "BatchNorm2d", "FrozenBatchNorm2d", "SigmoidFocalLoss"
          ]

--- a/maskrcnn_benchmark/layers/misc.py
+++ b/maskrcnn_benchmark/layers/misc.py
@@ -26,7 +26,6 @@ class _NewEmptyTensorOp(torch.autograd.Function):
        return _NewEmptyTensorOp.apply(grad, shape), None


-
 class Conv2d(torch.nn.Conv2d):
    def forward(self, x):
        if x.numel() > 0:
@@ -64,6 +63,15 @@ class ConvTranspose2d(torch.nn.ConvTranspose2d):
        return _NewEmptyTensorOp.apply(x, output_shape)


+class BatchNorm2d(torch.nn.BatchNorm2d):
+    def forward(self, x):
+        if x.numel() > 0:
+            return super(BatchNorm2d, self).forward(x)
+        # get output shape
+        output_shape = x.shape
+        return _NewEmptyTensorOp.apply(x, output_shape)
+
+
 def interpolate(
    input, size=None, scale_factor=None, mode="nearest", align_corners=None
 ):

--- a/maskrcnn_benchmark/modeling/backbone/__init__.py
+++ b/maskrcnn_benchmark/modeling/backbone/__init__.py
 # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
 from .backbone import build_backbone
+from . import fbnet
--- a/maskrcnn_benchmark/modeling/backbone/backbone.py
+++ b/maskrcnn_benchmark/modeling/backbone/backbone.py
@@ -16,6 +16,7 @@ from . import resnet
 def build_resnet_backbone(cfg):
    body = resnet.ResNet(cfg)
    model = nn.Sequential(OrderedDict([("body", body)]))
+    model.out_channels = cfg.MODEL.RESNETS.BACKBONE_OUT_CHANNELS
    return model


@@ -25,7 +26,7 @@ def build_resnet_backbone(cfg):
 def build_resnet_fpn_backbone(cfg):
    body = resnet.ResNet(cfg)
    in_channels_stage2 = cfg.MODEL.RESNETS.RES2_OUT_CHANNELS
-    out_channels = cfg.MODEL.BACKBONE.OUT_CHANNELS
+    out_channels = cfg.MODEL.RESNETS.BACKBONE_OUT_CHANNELS
    fpn = fpn_module.FPN(
        in_channels_list=[
            in_channels_stage2,
@@ -40,14 +41,16 @@ def build_resnet_fpn_backbone(cfg):
        top_blocks=fpn_module.LastLevelMaxPool(),
    )
    model = nn.Sequential(OrderedDict([("body", body), ("fpn", fpn)]))
+    model.out_channels = out_channels
    return model

+
 @registry.BACKBONES.register("R-50-FPN-RETINANET")
 @registry.BACKBONES.register("R-101-FPN-RETINANET")
 def build_resnet_fpn_p3p7_backbone(cfg):
    body = resnet.ResNet(cfg)
    in_channels_stage2 = cfg.MODEL.RESNETS.RES2_OUT_CHANNELS
-    out_channels = cfg.MODEL.BACKBONE.OUT_CHANNELS
+    out_channels = cfg.MODEL.RESNETS.BACKBONE_OUT_CHANNELS
    in_channels_p6p7 = in_channels_stage2 * 8 if cfg.MODEL.RETINANET.USE_C5 \
        else out_channels
    fpn = fpn_module.FPN(
@@ -64,8 +67,10 @@ def build_resnet_fpn_p3p7_backbone(cfg):
        top_blocks=fpn_module.LastLevelP6P7(in_channels_p6p7, out_channels),
    )
    model = nn.Sequential(OrderedDict([("body", body), ("fpn", fpn)]))
+    model.out_channels = out_channels
    return model

+
 def build_backbone(cfg):
    assert cfg.MODEL.BACKBONE.CONV_BODY in registry.BACKBONES, \
        "cfg.MODEL.BACKBONE.CONV_BODY: {} are not registered in registry".format(

--- a/maskrcnn_benchmark/modeling/backbone/fbnet.py
+++ b/maskrcnn_benchmark/modeling/backbone/fbnet.py
+from __future__ import absolute_import, division, print_function, unicode_literals
+
+import copy
+import json
+import logging
+from collections import OrderedDict
+
+from . import (
+    fbnet_builder as mbuilder,
+    fbnet_modeldef as modeldef,
+)
+import torch.nn as nn
+from maskrcnn_benchmark.modeling import registry
+from maskrcnn_benchmark.modeling.rpn import rpn
+from maskrcnn_benchmark.modeling import poolers
+
+
+logger = logging.getLogger(__name__)
+
+
+def create_builder(cfg):
+    bn_type = cfg.MODEL.FBNET.BN_TYPE
+    if bn_type == "gn":
+        bn_type = (bn_type, cfg.GROUP_NORM.NUM_GROUPS)
+    factor = cfg.MODEL.FBNET.SCALE_FACTOR
+
+    arch = cfg.MODEL.FBNET.ARCH
+    arch_def = cfg.MODEL.FBNET.ARCH_DEF
+    if len(arch_def) > 0:
+        arch_def = json.loads(arch_def)
+    if arch in modeldef.MODEL_ARCH:
+        if len(arch_def) > 0:
+            assert (
+                arch_def == modeldef.MODEL_ARCH[arch]
+            ), "Two architectures with the same name {},\n{},\n{}".format(
+                arch, arch_def, modeldef.MODEL_ARCH[arch]
+            )
+        arch_def = modeldef.MODEL_ARCH[arch]
+    else:
+        assert arch_def is not None and len(arch_def) > 0
+    arch_def = mbuilder.unify_arch_def(arch_def)
+
+    rpn_stride = arch_def.get("rpn_stride", None)
+    if rpn_stride is not None:
+        assert (
+            cfg.MODEL.RPN.ANCHOR_STRIDE[0] == rpn_stride
+        ), "Needs to set cfg.MODEL.RPN.ANCHOR_STRIDE to {}, got {}".format(
+            rpn_stride, cfg.MODEL.RPN.ANCHOR_STRIDE
+        )
+    width_divisor = cfg.MODEL.FBNET.WIDTH_DIVISOR
+    dw_skip_bn = cfg.MODEL.FBNET.DW_CONV_SKIP_BN
+    dw_skip_relu = cfg.MODEL.FBNET.DW_CONV_SKIP_RELU
+
+    logger.info(
+        "Building fbnet model with arch {} (without scaling):\n{}".format(
+            arch, arch_def
+        )
+    )
+
+    builder = mbuilder.FBNetBuilder(
+        width_ratio=factor,
+        bn_type=bn_type,
+        width_divisor=width_divisor,
+        dw_skip_bn=dw_skip_bn,
+        dw_skip_relu=dw_skip_relu,
+    )
+
+    return builder, arch_def
+
+
+def _get_trunk_cfg(arch_def):
+    """ Get all stages except the last one """
+    num_stages = mbuilder.get_num_stages(arch_def)
+    trunk_stages = arch_def.get("backbone", range(num_stages - 1))
+    ret = mbuilder.get_blocks(arch_def, stage_indices=trunk_stages)
+    return ret
+
+
+class FBNetTrunk(nn.Module):
+    def __init__(
+        self, builder, arch_def, dim_in,
+    ):
+        super(FBNetTrunk, self).__init__()
+        self.first = builder.add_first(arch_def["first"], dim_in=dim_in)
+        trunk_cfg = _get_trunk_cfg(arch_def)
+        self.stages = builder.add_blocks(trunk_cfg["stages"])
+
+    # return features for each stage
+    def forward(self, x):
+        y = self.first(x)
+        y = self.stages(y)
+        ret = [y]
+        return ret
+
+
+@registry.BACKBONES.register("FBNet")
+def add_conv_body(cfg, dim_in=3):
+    builder, arch_def = create_builder(cfg)
+
+    body = FBNetTrunk(builder, arch_def, dim_in)
+    model = nn.Sequential(OrderedDict([("body", body)]))
+    model.out_channels = builder.last_depth
+
+    return model
+
+
+def _get_rpn_stage(arch_def, num_blocks):
+    rpn_stage = arch_def.get("rpn")
+    ret = mbuilder.get_blocks(arch_def, stage_indices=rpn_stage)
+    if num_blocks > 0:
+        logger.warn('Use last {} blocks in {} as rpn'.format(num_blocks, ret))
+        block_count = len(ret["stages"])
+        assert num_blocks <= block_count, "use block {}, block count {}".format(
+            num_blocks, block_count
+        )
+        blocks = range(block_count - num_blocks, block_count)
+        ret = mbuilder.get_blocks(ret, block_indices=blocks)
+    return ret["stages"]
+
+
+class FBNetRPNHead(nn.Module):
+    def __init__(
+        self, cfg, in_channels, builder, arch_def,
+    ):
+        super(FBNetRPNHead, self).__init__()
+        assert in_channels == builder.last_depth
+
+        rpn_bn_type = cfg.MODEL.FBNET.RPN_BN_TYPE
+        if len(rpn_bn_type) > 0:
+            builder.bn_type = rpn_bn_type
+
+        use_blocks = cfg.MODEL.FBNET.RPN_HEAD_BLOCKS
+        stages = _get_rpn_stage(arch_def, use_blocks)
+
+        self.head = builder.add_blocks(stages)
+        self.out_channels = builder.last_depth
+
+    def forward(self, x):
+        x = [self.head(y) for y in x]
+        return x
+
+
+@registry.RPN_HEADS.register("FBNet.rpn_head")
+def add_rpn_head(cfg, in_channels, num_anchors):
+    builder, model_arch = create_builder(cfg)
+    builder.last_depth = in_channels
+
+    assert in_channels == builder.last_depth
+    # builder.name_prefix = "[rpn]"
+
+    rpn_feature = FBNetRPNHead(cfg, in_channels, builder, model_arch)
+    rpn_regressor = rpn.RPNHeadConvRegressor(
+        cfg, rpn_feature.out_channels, num_anchors)
+    return nn.Sequential(rpn_feature, rpn_regressor)
+
+
+def _get_head_stage(arch, head_name, blocks):
+    # use default name 'head' if the specific name 'head_name' does not existed
+    if head_name not in arch:
+        head_name = "head"
+    head_stage = arch.get(head_name)
+    ret = mbuilder.get_blocks(arch, stage_indices=head_stage, block_indices=blocks)
+    return ret["stages"]
+
+
+# name mapping for head names in arch def and cfg
+ARCH_CFG_NAME_MAPPING = {
+    "bbox": "ROI_BOX_HEAD",
+    "kpts": "ROI_KEYPOINT_HEAD",
+    "mask": "ROI_MASK_HEAD",
+}
+
+
+class FBNetROIHead(nn.Module):
+    def __init__(
+        self, cfg, in_channels, builder, arch_def,
+        head_name, use_blocks, stride_init, last_layer_scale,
+    ):
+        super(FBNetROIHead, self).__init__()
+        assert in_channels == builder.last_depth
+        assert isinstance(use_blocks, list)
+
+        head_cfg_name = ARCH_CFG_NAME_MAPPING[head_name]
+        self.pooler = poolers.make_pooler(cfg, head_cfg_name)
+
+        stage = _get_head_stage(arch_def, head_name, use_blocks)
+
+        assert stride_init in [0, 1, 2]
+        if stride_init != 0:
+            stage[0]["block"][3] = stride_init
+        blocks = builder.add_blocks(stage)
+
+        last_info = copy.deepcopy(arch_def["last"])
+        last_info[1] = last_layer_scale
+        last = builder.add_last(last_info)
+
+        self.head = nn.Sequential(OrderedDict([
+            ("blocks", blocks),
+            ("last", last)
+        ]))
+
+        # output_blob = builder.add_final_pool(
+        #     # model, output_blob, kernel_size=cfg.FAST_RCNN.ROI_XFORM_RESOLUTION)
+        #     model,
+        #     output_blob,
+        #     kernel_size=int(cfg.FAST_RCNN.ROI_XFORM_RESOLUTION / stride_init),
+        # )
+
+        self.out_channels = builder.last_depth
+
+    def forward(self, x, proposals):
+        x = self.pooler(x, proposals)
+        x = self.head(x)
+        return x
+
+
+@registry.ROI_BOX_FEATURE_EXTRACTORS.register("FBNet.roi_head")
+def add_roi_head(cfg, in_channels):
+    builder, model_arch = create_builder(cfg)
+    builder.last_depth = in_channels
+    # builder.name_prefix = "_[bbox]_"
+
+    return FBNetROIHead(
+        cfg, in_channels, builder, model_arch,
+        head_name="bbox",
+        use_blocks=cfg.MODEL.FBNET.DET_HEAD_BLOCKS,
+        stride_init=cfg.MODEL.FBNET.DET_HEAD_STRIDE,
+        last_layer_scale=cfg.MODEL.FBNET.DET_HEAD_LAST_SCALE,
+    )
+
+
+@registry.ROI_KEYPOINT_FEATURE_EXTRACTORS.register("FBNet.roi_head_keypoints")
+def add_roi_head_keypoints(cfg, in_channels):
+    builder, model_arch = create_builder(cfg)
+    builder.last_depth = in_channels
+    # builder.name_prefix = "_[kpts]_"
+
+    return FBNetROIHead(
+        cfg, in_channels, builder, model_arch,
+        head_name="kpts",
+        use_blocks=cfg.MODEL.FBNET.KPTS_HEAD_BLOCKS,
+        stride_init=cfg.MODEL.FBNET.KPTS_HEAD_STRIDE,
+        last_layer_scale=cfg.MODEL.FBNET.KPTS_HEAD_LAST_SCALE,
+    )
+
+
+@registry.ROI_MASK_FEATURE_EXTRACTORS.register("FBNet.roi_head_mask")
+def add_roi_head_mask(cfg, in_channels):
+    builder, model_arch = create_builder(cfg)
+    builder.last_depth = in_channels
+    # builder.name_prefix = "_[mask]_"
+
+    return FBNetROIHead(
+        cfg, in_channels, builder, model_arch,
+        head_name="mask",
+        use_blocks=cfg.MODEL.FBNET.MASK_HEAD_BLOCKS,
+        stride_init=cfg.MODEL.FBNET.MASK_HEAD_STRIDE,
+        last_layer_scale=cfg.MODEL.FBNET.MASK_HEAD_LAST_SCALE,
+    )
--- a/maskrcnn_benchmark/modeling/backbone/fbnet_builder.py
+++ b/maskrcnn_benchmark/modeling/backbone/fbnet_builder.py
--- a/maskrcnn_benchmark/modeling/backbone/fbnet_modeldef.py
+++ b/maskrcnn_benchmark/modeling/backbone/fbnet_modeldef.py
+from __future__ import absolute_import, division, print_function, unicode_literals
+
+
+def add_archs(archs):
+    global MODEL_ARCH
+    for x in archs:
+        assert x not in MODEL_ARCH, "Duplicated model name {} existed".format(x)
+        MODEL_ARCH[x] = archs[x]
+
+
+MODEL_ARCH = {
+    "default": {
+        "block_op_type": [
+            # stage 0
+            ["ir_k3"],
+            # stage 1
+            ["ir_k3"] * 2,
+            # stage 2
+            ["ir_k3"] * 3,
+            # stage 3
+            ["ir_k3"] * 7,
+            # stage 4, bbox head
+            ["ir_k3"] * 4,
+            # stage 5, rpn
+            ["ir_k3"] * 3,
+            # stage 5, mask head
+            ["ir_k3"] * 5,
+        ],
+        "block_cfg": {
+            "first": [32, 2],
+            "stages": [
+                # [t, c, n, s]
+                # stage 0
+                [[1, 16, 1, 1]],
+                # stage 1
+                [[6, 24, 2, 2]],
+                # stage 2
+                [[6, 32, 3, 2]],
+                # stage 3
+                [[6, 64, 4, 2], [6, 96, 3, 1]],
+                # stage 4, bbox head
+                [[4, 160, 1, 2], [6, 160, 2, 1], [6, 240, 1, 1]],
+                # [[6, 160, 3, 2], [6, 320, 1, 1]],
+                # stage 5, rpn head
+                [[6, 96, 3, 1]],
+                # stage 6, mask head
+                [[4, 160, 1, 1], [6, 160, 3, 1], [3, 80, 1, -2]],
+            ],
+            # [c, channel_scale]
+            "last": [1280, 0.0],
+            "backbone": [0, 1, 2, 3],
+            "rpn": [5],
+            "bbox": [4],
+            "mask": [6],
+        },
+    },
+    "xirb16d_dsmask": {
+        "block_op_type": [
+            # stage 0
+            ["ir_k3"],
+            # stage 1
+            ["ir_k3"] * 2,
+            # stage 2
+            ["ir_k3"] * 3,
+            # stage 3
+            ["ir_k3"] * 7,
+            # stage 4, bbox head
+            ["ir_k3"] * 4,
+            # stage 5, mask head
+            ["ir_k3"] * 5,
+            # stage 6, rpn
+            ["ir_k3"] * 3,
+        ],
+        "block_cfg": {
+            "first": [16, 2],
+            "stages": [
+                # [t, c, n, s]
+                # stage 0
+                [[1, 16, 1, 1]],
+                # stage 1
+                [[6, 32, 2, 2]],
+                # stage 2
+                [[6, 48, 3, 2]],
+                # stage 3
+                [[6, 96, 4, 2], [6, 128, 3, 1]],
+                # stage 4, bbox head
+                [[4, 128, 1, 2], [6, 128, 2, 1], [6, 160, 1, 1]],
+                # stage 5, mask head
+                [[4, 128, 1, 2], [6, 128, 2, 1], [6, 128, 1, -2], [3, 64, 1, -2]],
+                # stage 6, rpn head
+                [[6, 128, 3, 1]],
+            ],
+            # [c, channel_scale]
+            "last": [1280, 0.0],
+            "backbone": [0, 1, 2, 3],
+            "rpn": [6],
+            "bbox": [4],
+            "mask": [5],
+        },
+    },
+    "mobilenet_v2": {
+        "block_op_type": [
+            # stage 0
+            ["ir_k3"],
+            # stage 1
+            ["ir_k3"] * 2,
+            # stage 2
+            ["ir_k3"] * 3,
+            # stage 3
+            ["ir_k3"] * 7,
+            # stage 4
+            ["ir_k3"] * 4,
+        ],
+        "block_cfg": {
+            "first": [32, 2],
+            "stages": [
+                # [t, c, n, s]
+                # stage 0
+                [[1, 16, 1, 1]],
+                # stage 1
+                [[6, 24, 2, 2]],
+                # stage 2
+                [[6, 32, 3, 2]],
+                # stage 3
+                [[6, 64, 4, 2], [6, 96, 3, 1]],
+                # stage 4
+                [[6, 160, 3, 1], [6, 320, 1, 1]],
+            ],
+            # [c, channel_scale]
+            "last": [1280, 0.0],
+            "backbone": [0, 1, 2, 3],
+            "bbox": [4],
+        },
+    },
+}
--- a/maskrcnn_benchmark/modeling/backbone/resnet.py
+++ b/maskrcnn_benchmark/modeling/backbone/resnet.py
@@ -187,6 +187,7 @@ class ResNetHead(nn.Module):
            stride = None
            self.add_module(name, module)
            self.stages.append(name)
+        self.out_channels = out_channels

    def forward(self, x):
        for stage in self.stages:

--- a/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py
+++ b/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py
@@ -27,8 +27,8 @@ class GeneralizedRCNN(nn.Module):
        super(GeneralizedRCNN, self).__init__()

        self.backbone = build_backbone(cfg)
-        self.rpn = build_rpn(cfg)
-        self.roi_heads = build_roi_heads(cfg)
+        self.rpn = build_rpn(cfg, self.backbone.out_channels)
+        self.roi_heads = build_roi_heads(cfg, self.backbone.out_channels)

    def forward(self, images, targets=None):
        """

--- a/maskrcnn_benchmark/modeling/poolers.py
+++ b/maskrcnn_benchmark/modeling/poolers.py
@@ -119,3 +119,15 @@ class Pooler(nn.Module):
            result[idx_in_level] = pooler(per_level_feature, rois_per_level)

        return result
+
+
+def make_pooler(cfg, head_name):
+    resolution = cfg.MODEL[head_name].POOLER_RESOLUTION
+    scales = cfg.MODEL[head_name].POOLER_SCALES
+    sampling_ratio = cfg.MODEL[head_name].POOLER_SAMPLING_RATIO
+    pooler = Pooler(
+        output_size=(resolution, resolution),
+        scales=scales,
+        sampling_ratio=sampling_ratio,
+    )
+    return pooler
--- a/maskrcnn_benchmark/modeling/registry.py
+++ b/maskrcnn_benchmark/modeling/registry.py
@@ -3,6 +3,10 @@
 from maskrcnn_benchmark.utils.registry import Registry

 BACKBONES = Registry()
+RPN_HEADS = Registry()
 ROI_BOX_FEATURE_EXTRACTORS = Registry()
 ROI_BOX_PREDICTOR = Registry()
-RPN_HEADS = Registry()
+ROI_KEYPOINT_FEATURE_EXTRACTORS = Registry()
+ROI_KEYPOINT_PREDICTOR = Registry()
+ROI_MASK_FEATURE_EXTRACTORS = Registry()
+ROI_MASK_PREDICTOR = Registry()
--- a/maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py
+++ b/maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py
@@ -13,10 +13,11 @@ class ROIBoxHead(torch.nn.Module):
    Generic Box Head class.
    """

-    def __init__(self, cfg):
+    def __init__(self, cfg, in_channels):
        super(ROIBoxHead, self).__init__()
-        self.feature_extractor = make_roi_box_feature_extractor(cfg)
-        self.predictor = make_roi_box_predictor(cfg)
+        self.feature_extractor = make_roi_box_feature_extractor(cfg, in_channels)
+        self.predictor = make_roi_box_predictor(
+            cfg, self.feature_extractor.out_channels)
        self.post_processor = make_roi_box_post_processor(cfg)
        self.loss_evaluator = make_roi_box_loss_evaluator(cfg)

@@ -61,10 +62,10 @@ class ROIBoxHead(torch.nn.Module):
        )


-def build_roi_box_head(cfg):
+def build_roi_box_head(cfg, in_channels):
    """
    Constructs a new box head.
    By default, uses ROIBoxHead, but if it turns out not to be enough, just register a new class
    and make it a parameter in the config
    """
-    return ROIBoxHead(cfg)
+    return ROIBoxHead(cfg, in_channels)
--- a/maskrcnn_benchmark/modeling/roi_heads/box_head/roi_box_feature_extractors.py
+++ b/maskrcnn_benchmark/modeling/roi_heads/box_head/roi_box_feature_extractors.py
@@ -12,7 +12,7 @@ from maskrcnn_benchmark.modeling.make_layers import make_fc

 @registry.ROI_BOX_FEATURE_EXTRACTORS.register("ResNet50Conv5ROIFeatureExtractor")
 class ResNet50Conv5ROIFeatureExtractor(nn.Module):
-    def __init__(self, config):
+    def __init__(self, config, in_channels):
        super(ResNet50Conv5ROIFeatureExtractor, self).__init__()

        resolution = config.MODEL.ROI_BOX_HEAD.POOLER_RESOLUTION
@@ -38,6 +38,7 @@ class ResNet50Conv5ROIFeatureExtractor(nn.Module):

        self.pooler = pooler
        self.head = head
+        self.out_channels = head.out_channels

    def forward(self, x, proposals):
        x = self.pooler(x, proposals)
@@ -51,7 +52,7 @@ class FPN2MLPFeatureExtractor(nn.Module):
    Heads for FPN for classification
    """

-    def __init__(self, cfg):
+    def __init__(self, cfg, in_channels):
        super(FPN2MLPFeatureExtractor, self).__init__()

        resolution = cfg.MODEL.ROI_BOX_HEAD.POOLER_RESOLUTION
@@ -62,12 +63,13 @@ class FPN2MLPFeatureExtractor(nn.Module):
            scales=scales,
            sampling_ratio=sampling_ratio,
        )
-        input_size = cfg.MODEL.BACKBONE.OUT_CHANNELS * resolution ** 2
+        input_size = in_channels * resolution ** 2
        representation_size = cfg.MODEL.ROI_BOX_HEAD.MLP_HEAD_DIM
        use_gn = cfg.MODEL.ROI_BOX_HEAD.USE_GN
        self.pooler = pooler
        self.fc6 = make_fc(input_size, representation_size, use_gn)
        self.fc7 = make_fc(representation_size, representation_size, use_gn)
+        self.out_channels = representation_size

    def forward(self, x, proposals):
        x = self.pooler(x, proposals)
@@ -85,7 +87,7 @@ class FPNXconv1fcFeatureExtractor(nn.Module):
    Heads for FPN for classification
    """

-    def __init__(self, cfg):
+    def __init__(self, cfg, in_channels):
        super(FPNXconv1fcFeatureExtractor, self).__init__()

        resolution = cfg.MODEL.ROI_BOX_HEAD.POOLER_RESOLUTION
@@ -99,7 +101,6 @@ class FPNXconv1fcFeatureExtractor(nn.Module):
        self.pooler = pooler

        use_gn = cfg.MODEL.ROI_BOX_HEAD.USE_GN
-        in_channels = cfg.MODEL.BACKBONE.OUT_CHANNELS
        conv_head_dim = cfg.MODEL.ROI_BOX_HEAD.CONV_HEAD_DIM
        num_stacked_convs = cfg.MODEL.ROI_BOX_HEAD.NUM_STACKED_CONVS
        dilation = cfg.MODEL.ROI_BOX_HEAD.DILATION
@@ -133,6 +134,7 @@ class FPNXconv1fcFeatureExtractor(nn.Module):
        input_size = conv_head_dim * resolution ** 2
        representation_size = cfg.MODEL.ROI_BOX_HEAD.MLP_HEAD_DIM
        self.fc6 = make_fc(input_size, representation_size, use_gn=False)
+        self.out_channels = representation_size

    def forward(self, x, proposals):
        x = self.pooler(x, proposals)
@@ -142,8 +144,8 @@ class FPNXconv1fcFeatureExtractor(nn.Module):
        return x


-def make_roi_box_feature_extractor(cfg):
+def make_roi_box_feature_extractor(cfg, in_channels):
    func = registry.ROI_BOX_FEATURE_EXTRACTORS[
        cfg.MODEL.ROI_BOX_HEAD.FEATURE_EXTRACTOR
    ]
-    return func(cfg)
+    return func(cfg, in_channels)
--- a/maskrcnn_benchmark/modeling/roi_heads/box_head/roi_box_predictors.py
+++ b/maskrcnn_benchmark/modeling/roi_heads/box_head/roi_box_predictors.py
@@ -5,16 +5,14 @@ from torch import nn

 @registry.ROI_BOX_PREDICTOR.register("FastRCNNPredictor")
 class FastRCNNPredictor(nn.Module):
-    def __init__(self, config, pretrained=None):
+    def __init__(self, config, in_channels):
        super(FastRCNNPredictor, self).__init__()
+        assert in_channels is not None

-        stage_index = 4
-        stage2_relative_factor = 2 ** (stage_index - 1)
-        res2_out_channels = config.MODEL.RESNETS.RES2_OUT_CHANNELS
-        num_inputs = res2_out_channels * stage2_relative_factor
+        num_inputs = in_channels

        num_classes = config.MODEL.ROI_BOX_HEAD.NUM_CLASSES
-        self.avgpool = nn.AvgPool2d(kernel_size=7, stride=7)
+        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.cls_score = nn.Linear(num_inputs, num_classes)
        num_bbox_reg_classes = 2 if config.MODEL.CLS_AGNOSTIC_BBOX_REG else num_classes
        self.bbox_pred = nn.Linear(num_inputs, num_bbox_reg_classes * 4)
@@ -35,10 +33,10 @@ class FastRCNNPredictor(nn.Module):

 @registry.ROI_BOX_PREDICTOR.register("FPNPredictor")
 class FPNPredictor(nn.Module):
-    def __init__(self, cfg):
+    def __init__(self, cfg, in_channels):
        super(FPNPredictor, self).__init__()
        num_classes = cfg.MODEL.ROI_BOX_HEAD.NUM_CLASSES
-        representation_size = cfg.MODEL.ROI_BOX_HEAD.MLP_HEAD_DIM
+        representation_size = in_channels

        self.cls_score = nn.Linear(representation_size, num_classes)
        num_bbox_reg_classes = 2 if cfg.MODEL.CLS_AGNOSTIC_BBOX_REG else num_classes
@@ -50,12 +48,15 @@ class FPNPredictor(nn.Module):
            nn.init.constant_(l.bias, 0)

    def forward(self, x):
+        if x.ndimension() == 4:
+            assert list(x.shape[2:]) == [1, 1]
+            x = x.view(x.size(0), -1)
        scores = self.cls_score(x)
        bbox_deltas = self.bbox_pred(x)

        return scores, bbox_deltas


-def make_roi_box_predictor(cfg):
+def make_roi_box_predictor(cfg, in_channels):
    func = registry.ROI_BOX_PREDICTOR[cfg.MODEL.ROI_BOX_HEAD.PREDICTOR]
-    return func(cfg)
+    return func(cfg, in_channels)
--- a/maskrcnn_benchmark/modeling/roi_heads/keypoint_head/keypoint_head.py
+++ b/maskrcnn_benchmark/modeling/roi_heads/keypoint_head/keypoint_head.py
@@ -7,11 +7,12 @@ from .loss import make_roi_keypoint_loss_evaluator


 class ROIKeypointHead(torch.nn.Module):
-    def __init__(self, cfg):
+    def __init__(self, cfg, in_channels):
        super(ROIKeypointHead, self).__init__()
        self.cfg = cfg.clone()
-        self.feature_extractor = make_roi_keypoint_feature_extractor(cfg)
-        self.predictor = make_roi_keypoint_predictor(cfg)
+        self.feature_extractor = make_roi_keypoint_feature_extractor(cfg, in_channels)
+        self.predictor = make_roi_keypoint_predictor(
+            cfg, self.feature_extractor.out_channels)
        self.post_processor = make_roi_keypoint_post_processor(cfg)
        self.loss_evaluator = make_roi_keypoint_loss_evaluator(cfg)

@@ -46,5 +47,5 @@ class ROIKeypointHead(torch.nn.Module):
        return x, proposals, dict(loss_kp=loss_kp)


-def build_roi_keypoint_head(cfg):
-    return ROIKeypointHead(cfg)
+def build_roi_keypoint_head(cfg, in_channels):
+    return ROIKeypointHead(cfg, in_channels)
--- a/maskrcnn_benchmark/modeling/roi_heads/keypoint_head/roi_keypoint_feature_extractors.py
+++ b/maskrcnn_benchmark/modeling/roi_heads/keypoint_head/roi_keypoint_feature_extractors.py
 from torch import nn
 from torch.nn import functional as F

+from maskrcnn_benchmark.modeling import registry
 from maskrcnn_benchmark.modeling.poolers import Pooler

 from maskrcnn_benchmark.layers import Conv2d


+@registry.ROI_KEYPOINT_FEATURE_EXTRACTORS.register("KeypointRCNNFeatureExtractor")
 class KeypointRCNNFeatureExtractor(nn.Module):
-    def __init__(self, cfg):
+    def __init__(self, cfg, in_channels):
        super(KeypointRCNNFeatureExtractor, self).__init__()

        resolution = cfg.MODEL.ROI_KEYPOINT_HEAD.POOLER_RESOLUTION
@@ -20,7 +22,7 @@ class KeypointRCNNFeatureExtractor(nn.Module):
        )
        self.pooler = pooler

-        input_features = cfg.MODEL.BACKBONE.OUT_CHANNELS
+        input_features = in_channels
        layers = cfg.MODEL.ROI_KEYPOINT_HEAD.CONV_LAYERS
        next_feature = input_features
        self.blocks = []
@@ -32,6 +34,7 @@ class KeypointRCNNFeatureExtractor(nn.Module):
            self.add_module(layer_name, module)
            next_feature = layer_features
            self.blocks.append(layer_name)
+        self.out_channels = layer_features

    def forward(self, x, proposals):
        x = self.pooler(x, proposals)
@@ -40,13 +43,8 @@ class KeypointRCNNFeatureExtractor(nn.Module):
        return x


-_ROI_KEYPOINT_FEATURE_EXTRACTORS = {
-    "KeypointRCNNFeatureExtractor": KeypointRCNNFeatureExtractor
-}
-
-
-def make_roi_keypoint_feature_extractor(cfg):
-    func = _ROI_KEYPOINT_FEATURE_EXTRACTORS[
+def make_roi_keypoint_feature_extractor(cfg, in_channels):
+    func = registry.ROI_KEYPOINT_FEATURE_EXTRACTORS[
        cfg.MODEL.ROI_KEYPOINT_HEAD.FEATURE_EXTRACTOR
    ]
-    return func(cfg)
+    return func(cfg, in_channels)
--- a/maskrcnn_benchmark/modeling/roi_heads/keypoint_head/roi_keypoint_predictors.py
+++ b/maskrcnn_benchmark/modeling/roi_heads/keypoint_head/roi_keypoint_predictors.py
 from torch import nn
-from torch.nn import functional as F

 from maskrcnn_benchmark import layers
+from maskrcnn_benchmark.modeling import registry


+@registry.ROI_KEYPOINT_PREDICTOR.register("KeypointRCNNPredictor")
 class KeypointRCNNPredictor(nn.Module):
-    def __init__(self, cfg):
+    def __init__(self, cfg, in_channels):
        super(KeypointRCNNPredictor, self).__init__()
-        input_features = cfg.MODEL.ROI_KEYPOINT_HEAD.CONV_LAYERS[-1]
+        input_features = in_channels
        num_keypoints = cfg.MODEL.ROI_KEYPOINT_HEAD.NUM_CLASSES
        deconv_kernel = 4
        self.kps_score_lowres = layers.ConvTranspose2d(
@@ -22,6 +23,7 @@ class KeypointRCNNPredictor(nn.Module):
        )
        nn.init.constant_(self.kps_score_lowres.bias, 0)
        self.up_scale = 2
+        self.out_channels = num_keypoints

    def forward(self, x):
        x = self.kps_score_lowres(x)
@@ -31,9 +33,6 @@ class KeypointRCNNPredictor(nn.Module):
        return x


-_ROI_KEYPOINT_PREDICTOR = {"KeypointRCNNPredictor": KeypointRCNNPredictor}
-
-
-def make_roi_keypoint_predictor(cfg):
-    func = _ROI_KEYPOINT_PREDICTOR[cfg.MODEL.ROI_KEYPOINT_HEAD.PREDICTOR]
-    return func(cfg)
+def make_roi_keypoint_predictor(cfg, in_channels):
+    func = registry.ROI_KEYPOINT_PREDICTOR[cfg.MODEL.ROI_KEYPOINT_HEAD.PREDICTOR]
+    return func(cfg, in_channels)
--- a/maskrcnn_benchmark/modeling/roi_heads/mask_head/mask_head.py
+++ b/maskrcnn_benchmark/modeling/roi_heads/mask_head/mask_head.py
@@ -34,11 +34,12 @@ def keep_only_positive_boxes(boxes):


 class ROIMaskHead(torch.nn.Module):
-    def __init__(self, cfg):
+    def __init__(self, cfg, in_channels):
        super(ROIMaskHead, self).__init__()
        self.cfg = cfg.clone()
-        self.feature_extractor = make_roi_mask_feature_extractor(cfg)
-        self.predictor = make_roi_mask_predictor(cfg)
+        self.feature_extractor = make_roi_mask_feature_extractor(cfg, in_channels)
+        self.predictor = make_roi_mask_predictor(
+            cfg, self.feature_extractor.out_channels)
        self.post_processor = make_roi_mask_post_processor(cfg)
        self.loss_evaluator = make_roi_mask_loss_evaluator(cfg)

@@ -78,5 +79,5 @@ class ROIMaskHead(torch.nn.Module):
        return x, all_proposals, dict(loss_mask=loss_mask)


-def build_roi_mask_head(cfg):
-    return ROIMaskHead(cfg)
+def build_roi_mask_head(cfg, in_channels):
+    return ROIMaskHead(cfg, in_channels)
--- a/maskrcnn_benchmark/modeling/roi_heads/mask_head/roi_mask_feature_extractors.py
+++ b/maskrcnn_benchmark/modeling/roi_heads/mask_head/roi_mask_feature_extractors.py
@@ -3,18 +3,23 @@ from torch import nn
 from torch.nn import functional as F

 from ..box_head.roi_box_feature_extractors import ResNet50Conv5ROIFeatureExtractor
+from maskrcnn_benchmark.modeling import registry
 from maskrcnn_benchmark.modeling.poolers import Pooler
-from maskrcnn_benchmark.layers import Conv2d
 from maskrcnn_benchmark.modeling.make_layers import make_conv3x3


+registry.ROI_MASK_FEATURE_EXTRACTORS.register(
+    "ResNet50Conv5ROIFeatureExtractor", ResNet50Conv5ROIFeatureExtractor
+)

+
+@registry.ROI_MASK_FEATURE_EXTRACTORS.register("MaskRCNNFPNFeatureExtractor")
 class MaskRCNNFPNFeatureExtractor(nn.Module):
    """
    Heads for FPN for classification
    """

-    def __init__(self, cfg):
+    def __init__(self, cfg, in_channels):
        """
        Arguments:
            num_classes (int): number of output classes
@@ -31,7 +36,7 @@ class MaskRCNNFPNFeatureExtractor(nn.Module):
            scales=scales,
            sampling_ratio=sampling_ratio,
        )
-        input_size = cfg.MODEL.BACKBONE.OUT_CHANNELS
+        input_size = in_channels
        self.pooler = pooler

        use_gn = cfg.MODEL.ROI_MASK_HEAD.USE_GN
@@ -42,12 +47,14 @@ class MaskRCNNFPNFeatureExtractor(nn.Module):
        self.blocks = []
        for layer_idx, layer_features in enumerate(layers, 1):
            layer_name = "mask_fcn{}".format(layer_idx)
-            module = make_conv3x3(next_feature, layer_features, 
+            module = make_conv3x3(
+                next_feature, layer_features,
                dilation=dilation, stride=1, use_gn=use_gn
            )
            self.add_module(layer_name, module)
            next_feature = layer_features
            self.blocks.append(layer_name)
+        self.out_channels = layer_features

    def forward(self, x, proposals):
        x = self.pooler(x, proposals)
@@ -58,12 +65,8 @@ class MaskRCNNFPNFeatureExtractor(nn.Module):
        return x


-_ROI_MASK_FEATURE_EXTRACTORS = {
-    "ResNet50Conv5ROIFeatureExtractor": ResNet50Conv5ROIFeatureExtractor,
-    "MaskRCNNFPNFeatureExtractor": MaskRCNNFPNFeatureExtractor,
-}
-
-
-def make_roi_mask_feature_extractor(cfg):
-    func = _ROI_MASK_FEATURE_EXTRACTORS[cfg.MODEL.ROI_MASK_HEAD.FEATURE_EXTRACTOR]
-    return func(cfg)
+def make_roi_mask_feature_extractor(cfg, in_channels):
+    func = registry.ROI_MASK_FEATURE_EXTRACTORS[
+        cfg.MODEL.ROI_MASK_HEAD.FEATURE_EXTRACTOR
+    ]
+    return func(cfg, in_channels)
--- a/maskrcnn_benchmark/modeling/roi_heads/mask_head/roi_mask_predictors.py
+++ b/maskrcnn_benchmark/modeling/roi_heads/mask_head/roi_mask_predictors.py
@@ -4,21 +4,16 @@ from torch.nn import functional as F

 from maskrcnn_benchmark.layers import Conv2d
 from maskrcnn_benchmark.layers import ConvTranspose2d
+from maskrcnn_benchmark.modeling import registry


+@registry.ROI_MASK_PREDICTOR.register("MaskRCNNC4Predictor")
 class MaskRCNNC4Predictor(nn.Module):
-    def __init__(self, cfg):
+    def __init__(self, cfg, in_channels):
        super(MaskRCNNC4Predictor, self).__init__()
        num_classes = cfg.MODEL.ROI_BOX_HEAD.NUM_CLASSES
        dim_reduced = cfg.MODEL.ROI_MASK_HEAD.CONV_LAYERS[-1]
-
-        if cfg.MODEL.ROI_HEADS.USE_FPN:
-            num_inputs = dim_reduced
-        else:
-            stage_index = 4
-            stage2_relative_factor = 2 ** (stage_index - 1)
-            res2_out_channels = cfg.MODEL.RESNETS.RES2_OUT_CHANNELS
-            num_inputs = res2_out_channels * stage2_relative_factor
+        num_inputs = in_channels

        self.conv5_mask = ConvTranspose2d(num_inputs, dim_reduced, 2, 2, 0)
        self.mask_fcn_logits = Conv2d(dim_reduced, num_classes, 1, 1, 0)
@@ -36,9 +31,27 @@ class MaskRCNNC4Predictor(nn.Module):
        return self.mask_fcn_logits(x)


-_ROI_MASK_PREDICTOR = {"MaskRCNNC4Predictor": MaskRCNNC4Predictor}
+@registry.ROI_MASK_PREDICTOR.register("MaskRCNNConv1x1Predictor")
+class MaskRCNNConv1x1Predictor(nn.Module):
+    def __init__(self, cfg, in_channels):
+        super(MaskRCNNConv1x1Predictor, self).__init__()
+        num_classes = cfg.MODEL.ROI_BOX_HEAD.NUM_CLASSES
+        num_inputs = in_channels
+
+        self.mask_fcn_logits = Conv2d(num_inputs, num_classes, 1, 1, 0)
+
+        for name, param in self.named_parameters():
+            if "bias" in name:
+                nn.init.constant_(param, 0)
+            elif "weight" in name:
+                # Caffe2 implementation uses MSRAFill, which in fact
+                # corresponds to kaiming_normal_ in PyTorch
+                nn.init.kaiming_normal_(param, mode="fan_out", nonlinearity="relu")
+
+    def forward(self, x):
+        return self.mask_fcn_logits(x)


-def make_roi_mask_predictor(cfg):
-    func = _ROI_MASK_PREDICTOR[cfg.MODEL.ROI_MASK_HEAD.PREDICTOR]
-    return func(cfg)
+def make_roi_mask_predictor(cfg, in_channels):
+    func = registry.ROI_MASK_PREDICTOR[cfg.MODEL.ROI_MASK_HEAD.PREDICTOR]
+    return func(cfg, in_channels)
--- a/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py
+++ b/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py
@@ -55,7 +55,7 @@ class CombinedROIHeads(torch.nn.ModuleDict):
        return x, detections, losses


-def build_roi_heads(cfg):
+def build_roi_heads(cfg, in_channels):
    # individually create the heads, that will be combined together
    # afterwards
    roi_heads = []
@@ -63,11 +63,11 @@ def build_roi_heads(cfg):
        return []

    if not cfg.MODEL.RPN_ONLY:
-        roi_heads.append(("box", build_roi_box_head(cfg)))
+        roi_heads.append(("box", build_roi_box_head(cfg, in_channels)))
    if cfg.MODEL.MASK_ON:
-        roi_heads.append(("mask", build_roi_mask_head(cfg)))
+        roi_heads.append(("mask", build_roi_mask_head(cfg, in_channels)))
    if cfg.MODEL.KEYPOINT_ON:
-        roi_heads.append(("keypoint", build_roi_keypoint_head(cfg)))
+        roi_heads.append(("keypoint", build_roi_keypoint_head(cfg, in_channels)))

    # combine individual heads in a single module
    if roi_heads:

--- a/maskrcnn_benchmark/modeling/rpn/retinanet/retinanet.py
+++ b/maskrcnn_benchmark/modeling/rpn/retinanet/retinanet.py
@@ -15,7 +15,7 @@ class RetinaNetHead(torch.nn.Module):
    Adds a RetinNet head with classification and regression heads
    """

-    def __init__(self, cfg):
+    def __init__(self, cfg, in_channels):
        """
        Arguments:
            in_channels (int): number of channels of the input feature
@@ -24,7 +24,6 @@ class RetinaNetHead(torch.nn.Module):
        super(RetinaNetHead, self).__init__()
        # TODO: Implement the sigmoid version first.
        num_classes = cfg.MODEL.RETINANET.NUM_CLASSES - 1
-        in_channels = cfg.MODEL.BACKBONE.OUT_CHANNELS
        num_anchors = len(cfg.MODEL.RETINANET.ASPECT_RATIOS) \
                        * cfg.MODEL.RETINANET.SCALES_PER_OCTAVE

@@ -92,13 +91,13 @@ class RetinaNetModule(torch.nn.Module):
    RetinaNet outputs and losses. Only Test on FPN now.
    """

-    def __init__(self, cfg):
+    def __init__(self, cfg, in_channels):
        super(RetinaNetModule, self).__init__()

        self.cfg = cfg.clone()

        anchor_generator = make_anchor_generator_retinanet(cfg)
-        head = RetinaNetHead(cfg)
+        head = RetinaNetHead(cfg, in_channels)
        box_coder = BoxCoder(weights=(10., 10., 5., 5.))

        box_selector_test = make_retinanet_postprocessor(cfg, box_coder, is_train=False)
@@ -149,5 +148,5 @@ class RetinaNetModule(torch.nn.Module):
        return boxes, {}


-def build_retinanet(cfg):
-    return RetinaNetModule(cfg)
+def build_retinanet(cfg, in_channels):
+    return RetinaNetModule(cfg, in_channels)
--- a/maskrcnn_benchmark/modeling/rpn/rpn.py
+++ b/maskrcnn_benchmark/modeling/rpn/rpn.py
@@ -10,6 +10,66 @@ from .loss import make_rpn_loss_evaluator
 from .anchor_generator import make_anchor_generator
 from .inference import make_rpn_postprocessor

+
+class RPNHeadConvRegressor(nn.Module):
+    """
+    A simple RPN Head for classification and bbox regression
+    """
+
+    def __init__(self, cfg, in_channels, num_anchors):
+        """
+        Arguments:
+            cfg              : config
+            in_channels (int): number of channels of the input feature
+            num_anchors (int): number of anchors to be predicted
+        """
+        super(RPNHeadConvRegressor, self).__init__()
+        self.cls_logits = nn.Conv2d(in_channels, num_anchors, kernel_size=1, stride=1)
+        self.bbox_pred = nn.Conv2d(
+            in_channels, num_anchors * 4, kernel_size=1, stride=1
+        )
+
+        for l in [self.cls_logits, self.bbox_pred]:
+            torch.nn.init.normal_(l.weight, std=0.01)
+            torch.nn.init.constant_(l.bias, 0)
+
+    def forward(self, x):
+        assert isinstance(x, (list, tuple))
+        logits = [self.cls_logits(y) for y in x]
+        bbox_reg = [self.bbox_pred(y) for y in x]
+
+        return logits, bbox_reg
+
+
+class RPNHeadFeatureSingleConv(nn.Module):
+    """
+    Adds a simple RPN Head with one conv to extract the feature
+    """
+
+    def __init__(self, cfg, in_channels):
+        """
+        Arguments:
+            cfg              : config
+            in_channels (int): number of channels of the input feature
+        """
+        super(RPNHeadFeatureSingleConv, self).__init__()
+        self.conv = nn.Conv2d(
+            in_channels, in_channels, kernel_size=3, stride=1, padding=1
+        )
+
+        for l in [self.conv]:
+            torch.nn.init.normal_(l.weight, std=0.01)
+            torch.nn.init.constant_(l.bias, 0)
+
+        self.out_channels = in_channels
+
+    def forward(self, x):
+        assert isinstance(x, (list, tuple))
+        x = [F.relu(self.conv(z)) for z in x]
+
+        return x
+
+
 @registry.RPN_HEADS.register("SingleConvRPNHead")
 class RPNHead(nn.Module):
    """
@@ -52,14 +112,13 @@ class RPNModule(torch.nn.Module):
    proposals and losses. Works for both FPN and non-FPN.
    """

-    def __init__(self, cfg):
+    def __init__(self, cfg, in_channels):
        super(RPNModule, self).__init__()

        self.cfg = cfg.clone()

        anchor_generator = make_anchor_generator(cfg)

-        in_channels = cfg.MODEL.BACKBONE.OUT_CHANNELS
        rpn_head = registry.RPN_HEADS[cfg.MODEL.RPN.RPN_HEAD]
        head = rpn_head(
            cfg, in_channels, anchor_generator.num_anchors_per_location()[0]
@@ -138,11 +197,11 @@ class RPNModule(torch.nn.Module):
        return boxes, {}


-def build_rpn(cfg):
+def build_rpn(cfg, in_channels):
    """
    This gives the gist of it. Not super important because it doesn't change as much
    """
    if cfg.MODEL.RETINANET_ON:
-        return build_retinanet(cfg)
+        return build_retinanet(cfg, in_channels)

-    return RPNModule(cfg)
+    return RPNModule(cfg, in_channels)
--- a/tests/env_tests/env.py
+++ b/tests/env_tests/env.py
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+
+import os
+
+
+def get_config_root_path():
+    ''' Path to configs for unit tests '''
+    # cur_file_dir is root/tests/env_tests
+    cur_file_dir = os.path.dirname(os.path.abspath(os.path.realpath(__file__)))
+    ret = os.path.dirname(os.path.dirname(cur_file_dir))
+    ret = os.path.join(ret, "configs")
+    return ret
--- a/tests/test_backbones.py
+++ b/tests/test_backbones.py
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+
+import unittest
+import copy
+import torch
+# import modules to to register backbones
+from maskrcnn_benchmark.modeling.backbone import build_backbone # NoQA
+from maskrcnn_benchmark.modeling import registry
+from maskrcnn_benchmark.config import cfg as g_cfg
+from utils import load_config
+
+
+# overwrite configs if specified, otherwise default config is used
+BACKBONE_CFGS = {
+    "R-50-FPN": "e2e_faster_rcnn_R_50_FPN_1x.yaml",
+    "R-101-FPN": "e2e_faster_rcnn_R_101_FPN_1x.yaml",
+    "R-152-FPN": "e2e_faster_rcnn_R_101_FPN_1x.yaml",
+    "R-50-FPN-RETINANET": "retinanet/retinanet_R-50-FPN_1x.yaml",
+    "R-101-FPN-RETINANET": "retinanet/retinanet_R-101-FPN_1x.yaml",
+}
+
+
+class TestBackbones(unittest.TestCase):
+    def test_build_backbones(self):
+        ''' Make sure backbones run '''
+
+        self.assertGreater(len(registry.BACKBONES), 0)
+
+        for name, backbone_builder in registry.BACKBONES.items():
+            print('Testing {}...'.format(name))
+            if name in BACKBONE_CFGS:
+                cfg = load_config(BACKBONE_CFGS[name])
+            else:
+                # Use default config if config file is not specified
+                cfg = copy.deepcopy(g_cfg)
+            backbone = backbone_builder(cfg)
+
+            # make sures the backbone has `out_channels`
+            self.assertIsNotNone(
+                getattr(backbone, 'out_channels', None),
+                'Need to provide out_channels for backbone {}'.format(name)
+            )
+
+            N, C_in, H, W = 2, 3, 224, 256
+            input = torch.rand([N, C_in, H, W], dtype=torch.float32)
+            out = backbone(input)
+            for cur_out in out:
+                self.assertEqual(
+                    cur_out.shape[:2],
+                    torch.Size([N, backbone.out_channels])
+                )
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_box_coder.py
+++ b/tests/test_box_coder.py
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+
+import unittest
+
+import numpy as np
+import torch
+from maskrcnn_benchmark.modeling.box_coder import BoxCoder
+
+
+class TestBoxCoder(unittest.TestCase):
+    def test_box_decoder(self):
+        """ Match unit test UtilsBoxesTest.TestBboxTransformRandom in
+            caffe2/operators/generate_proposals_op_util_boxes_test.cc
+        """
+        box_coder = BoxCoder(weights=(1.0, 1.0, 1.0, 1.0))
+        bbox = torch.from_numpy(
+            np.array(
+                [
+                    175.62031555,
+                    20.91103172,
+                    253.352005,
+                    155.0145874,
+                    169.24636841,
+                    4.85241556,
+                    228.8605957,
+                    105.02092743,
+                    181.77426147,
+                    199.82876587,
+                    192.88427734,
+                    214.0255127,
+                    174.36262512,
+                    186.75761414,
+                    296.19091797,
+                    231.27906799,
+                    22.73153877,
+                    92.02596283,
+                    135.5695343,
+                    208.80291748,
+                ]
+            )
+            .astype(np.float32)
+            .reshape(-1, 4)
+        )
+
+        deltas = torch.from_numpy(
+            np.array(
+                [
+                    0.47861834,
+                    0.13992102,
+                    0.14961673,
+                    0.71495209,
+                    0.29915856,
+                    -0.35664671,
+                    0.89018666,
+                    0.70815367,
+                    -0.03852064,
+                    0.44466892,
+                    0.49492538,
+                    0.71409376,
+                    0.28052918,
+                    0.02184832,
+                    0.65289006,
+                    1.05060139,
+                    -0.38172557,
+                    -0.08533806,
+                    -0.60335309,
+                    0.79052375,
+                ]
+            )
+            .astype(np.float32)
+            .reshape(-1, 4)
+        )
+
+        gt_bbox = (
+            np.array(
+                [
+                    206.949539,
+                    -30.715202,
+                    297.387665,
+                    244.448486,
+                    143.871216,
+                    -83.342888,
+                    290.502289,
+                    121.053398,
+                    177.430283,
+                    198.666245,
+                    196.295273,
+                    228.703079,
+                    152.251892,
+                    145.431564,
+                    387.215454,
+                    274.594238,
+                    5.062420,
+                    11.040955,
+                    66.328903,
+                    269.686218,
+                ]
+            )
+            .astype(np.float32)
+            .reshape(-1, 4)
+        )
+
+        results = box_coder.decode(deltas, bbox)
+
+        np.testing.assert_allclose(results.detach().numpy(), gt_bbox, atol=1e-4)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_configs.py
+++ b/tests/test_configs.py
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+
+import unittest
+import glob
+import os
+import utils
+
+
+class TestConfigs(unittest.TestCase):
+    def test_configs_load(self):
+        ''' Make sure configs are loadable '''
+
+        cfg_root_path = utils.get_config_root_path()
+        files = glob.glob(
+            os.path.join(cfg_root_path, "./**/*.yaml"), recursive=True)
+        self.assertGreater(len(files), 0)
+
+        for fn in files:
+            print('Loading {}...'.format(fn))
+            utils.load_config_from_file(fn)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_fbnet.py
+++ b/tests/test_fbnet.py
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+
+import unittest
+
+import numpy as np
+import torch
+import maskrcnn_benchmark.modeling.backbone.fbnet_builder as fbnet_builder
+
+
+TEST_CUDA = torch.cuda.is_available()
+
+
+def _test_primitive(self, device, op_name, op_func, N, C_in, C_out, expand, stride):
+    op = op_func(C_in, C_out, expand, stride).to(device)
+    input = torch.rand([N, C_in, 7, 7], dtype=torch.float32).to(device)
+    output = op(input)
+    self.assertEqual(
+        output.shape[:2], torch.Size([N, C_out]),
+        'Primitive {} failed for shape {}.'.format(op_name, input.shape)
+    )
+
+
+class TestFBNetBuilder(unittest.TestCase):
+    def test_identity(self):
+        id_op = fbnet_builder.Identity(20, 20, 1)
+        input = torch.rand([10, 20, 7, 7], dtype=torch.float32)
+        output = id_op(input)
+        np.testing.assert_array_equal(np.array(input), np.array(output))
+
+        id_op = fbnet_builder.Identity(20, 40, 2)
+        input = torch.rand([10, 20, 7, 7], dtype=torch.float32)
+        output = id_op(input)
+        np.testing.assert_array_equal(output.shape, [10, 40, 4, 4])
+
+    def test_primitives(self):
+        ''' Make sures the primitives runs '''
+        for op_name, op_func in fbnet_builder.PRIMITIVES.items():
+            print('Testing {}'.format(op_name))
+
+            _test_primitive(
+                self, "cpu",
+                op_name, op_func,
+                N=20, C_in=16, C_out=32, expand=4, stride=1
+            )
+
+    @unittest.skipIf(not TEST_CUDA, "no CUDA detected")
+    def test_primitives_cuda(self):
+        ''' Make sures the primitives runs on cuda '''
+        for op_name, op_func in fbnet_builder.PRIMITIVES.items():
+            print('Testing {}'.format(op_name))
+
+            _test_primitive(
+                self, "cuda",
+                op_name, op_func,
+                N=20, C_in=16, C_out=32, expand=4, stride=1
+            )
+
+    def test_primitives_empty_batch(self):
+        ''' Make sures the primitives runs '''
+        for op_name, op_func in fbnet_builder.PRIMITIVES.items():
+            print('Testing {}'.format(op_name))
+
+            # test empty batch size
+            _test_primitive(
+                self, "cpu",
+                op_name, op_func,
+                N=0, C_in=16, C_out=32, expand=4, stride=1
+            )
+
+    @unittest.skipIf(not TEST_CUDA, "no CUDA detected")
+    def test_primitives_cuda_empty_batch(self):
+        ''' Make sures the primitives runs '''
+        for op_name, op_func in fbnet_builder.PRIMITIVES.items():
+            print('Testing {}'.format(op_name))
+
+            # test empty batch size
+            _test_primitive(
+                self, "cuda",
+                op_name, op_func,
+                N=0, C_in=16, C_out=32, expand=4, stride=1
+            )
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_feature_extractors.py
+++ b/tests/test_feature_extractors.py
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+
+import unittest
+import copy
+import torch
+# import modules to to register feature extractors
+from maskrcnn_benchmark.modeling.backbone import build_backbone # NoQA
+from maskrcnn_benchmark.modeling.roi_heads.roi_heads import build_roi_heads # NoQA
+from maskrcnn_benchmark.modeling import registry
+from maskrcnn_benchmark.structures.bounding_box import BoxList
+from maskrcnn_benchmark.config import cfg as g_cfg
+from utils import load_config
+
+# overwrite configs if specified, otherwise default config is used
+FEATURE_EXTRACTORS_CFGS = {
+}
+
+# overwrite configs if specified, otherwise default config is used
+FEATURE_EXTRACTORS_INPUT_CHANNELS = {
+    # in_channels was not used, load through config
+    "ResNet50Conv5ROIFeatureExtractor": 1024,
+}
+
+
+def _test_feature_extractors(
+    self, extractors, overwrite_cfgs, overwrite_in_channels
+):
+    ''' Make sure roi box feature extractors run '''
+
+    self.assertGreater(len(extractors), 0)
+
+    in_channels_default = 64
+
+    for name, builder in extractors.items():
+        print('Testing {}...'.format(name))
+        if name in overwrite_cfgs:
+            cfg = load_config(overwrite_cfgs[name])
+        else:
+            # Use default config if config file is not specified
+            cfg = copy.deepcopy(g_cfg)
+
+        in_channels = overwrite_in_channels.get(
+            name, in_channels_default)
+
+        fe = builder(cfg, in_channels)
+        self.assertIsNotNone(
+            getattr(fe, 'out_channels', None),
+            'Need to provide out_channels for feature extractor {}'.format(name)
+        )
+
+        N, C_in, H, W = 2, in_channels, 24, 32
+        input = torch.rand([N, C_in, H, W], dtype=torch.float32)
+        bboxes = [[1, 1, 10, 10], [5, 5, 8, 8], [2, 2, 3, 4]]
+        img_size = [384, 512]
+        box_list = BoxList(bboxes, img_size, "xyxy")
+        out = fe([input], [box_list] * N)
+        self.assertEqual(
+            out.shape[:2],
+            torch.Size([N * len(bboxes), fe.out_channels])
+        )
+
+
+class TestFeatureExtractors(unittest.TestCase):
+    def test_roi_box_feature_extractors(self):
+        ''' Make sure roi box feature extractors run '''
+        _test_feature_extractors(
+            self,
+            registry.ROI_BOX_FEATURE_EXTRACTORS,
+            FEATURE_EXTRACTORS_CFGS,
+            FEATURE_EXTRACTORS_INPUT_CHANNELS,
+        )
+
+    def test_roi_keypoints_feature_extractors(self):
+        ''' Make sure roi keypoints feature extractors run '''
+        _test_feature_extractors(
+            self,
+            registry.ROI_KEYPOINT_FEATURE_EXTRACTORS,
+            FEATURE_EXTRACTORS_CFGS,
+            FEATURE_EXTRACTORS_INPUT_CHANNELS,
+        )
+
+    def test_roi_mask_feature_extractors(self):
+        ''' Make sure roi mask feature extractors run '''
+        _test_feature_extractors(
+            self,
+            registry.ROI_MASK_FEATURE_EXTRACTORS,
+            FEATURE_EXTRACTORS_CFGS,
+            FEATURE_EXTRACTORS_INPUT_CHANNELS,
+        )
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_nms.py
+++ b/tests/test_nms.py
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+
+import unittest
+
+import numpy as np
+import torch
+from maskrcnn_benchmark.layers import nms as box_nms
+
+
+class TestNMS(unittest.TestCase):
+    def test_nms_cpu(self):
+        """ Match unit test UtilsNMSTest.TestNMS in
+            caffe2/operators/generate_proposals_op_util_nms_test.cc
+        """
+
+        inputs = (
+            np.array(
+                [
+                    10,
+                    10,
+                    50,
+                    60,
+                    0.5,
+                    11,
+                    12,
+                    48,
+                    60,
+                    0.7,
+                    8,
+                    9,
+                    40,
+                    50,
+                    0.6,
+                    100,
+                    100,
+                    150,
+                    140,
+                    0.9,
+                    99,
+                    110,
+                    155,
+                    139,
+                    0.8,
+                ]
+            )
+            .astype(np.float32)
+            .reshape(-1, 5)
+        )
+
+        boxes = torch.from_numpy(inputs[:, :4])
+        scores = torch.from_numpy(inputs[:, 4])
+        test_thresh = [0.1, 0.3, 0.5, 0.8, 0.9]
+        gt_indices = [[1, 3], [1, 3], [1, 3], [1, 2, 3, 4], [0, 1, 2, 3, 4]]
+
+        for thresh, gt_index in zip(test_thresh, gt_indices):
+            keep_indices = box_nms(boxes, scores, thresh)
+            keep_indices = np.sort(keep_indices)
+            np.testing.assert_array_equal(keep_indices, np.array(gt_index))
+
+    def test_nms1_cpu(self):
+        """ Match unit test UtilsNMSTest.TestNMS1 in
+            caffe2/operators/generate_proposals_op_util_nms_test.cc
+        """
+
+        boxes = torch.from_numpy(
+            np.array(
+                [
+                    [350.9821, 161.8200, 369.9685, 205.2372],
+                    [250.5236, 154.2844, 274.1773, 204.9810],
+                    [471.4920, 160.4118, 496.0094, 213.4244],
+                    [352.0421, 164.5933, 366.4458, 205.9624],
+                    [166.0765, 169.7707, 183.0102, 232.6606],
+                    [252.3000, 183.1449, 269.6541, 210.6747],
+                    [469.7862, 162.0192, 482.1673, 187.0053],
+                    [168.4862, 174.2567, 181.7437, 232.9379],
+                    [470.3290, 162.3442, 496.4272, 214.6296],
+                    [251.0450, 155.5911, 272.2693, 203.3675],
+                    [252.0326, 154.7950, 273.7404, 195.3671],
+                    [351.7479, 161.9567, 370.6432, 204.3047],
+                    [496.3306, 161.7157, 515.0573, 210.7200],
+                    [471.0749, 162.6143, 485.3374, 207.3448],
+                    [250.9745, 160.7633, 264.1924, 206.8350],
+                    [470.4792, 169.0351, 487.1934, 220.2984],
+                    [474.4227, 161.9546, 513.1018, 215.5193],
+                    [251.9428, 184.1950, 262.6937, 207.6416],
+                    [252.6623, 175.0252, 269.8806, 213.7584],
+                    [260.9884, 157.0351, 288.3554, 206.6027],
+                    [251.3629, 164.5101, 263.2179, 202.4203],
+                    [471.8361, 190.8142, 485.6812, 220.8586],
+                    [248.6243, 156.9628, 264.3355, 199.2767],
+                    [495.1643, 158.0483, 512.6261, 184.4192],
+                    [376.8718, 168.0144, 387.3584, 201.3210],
+                    [122.9191, 160.7433, 172.5612, 231.3837],
+                    [350.3857, 175.8806, 366.2500, 205.4329],
+                    [115.2958, 162.7822, 161.9776, 229.6147],
+                    [168.4375, 177.4041, 180.8028, 232.4551],
+                    [169.7939, 184.4330, 181.4767, 232.1220],
+                    [347.7536, 175.9356, 355.8637, 197.5586],
+                    [495.5434, 164.6059, 516.4031, 207.7053],
+                    [172.1216, 194.6033, 183.1217, 235.2653],
+                    [264.2654, 181.5540, 288.4626, 214.0170],
+                    [111.7971, 183.7748, 137.3745, 225.9724],
+                    [253.4919, 186.3945, 280.8694, 210.0731],
+                    [165.5334, 169.7344, 185.9159, 232.8514],
+                    [348.3662, 184.5187, 354.9081, 201.4038],
+                    [164.6562, 162.5724, 186.3108, 233.5010],
+                    [113.2999, 186.8410, 135.8841, 219.7642],
+                    [117.0282, 179.8009, 142.5375, 221.0736],
+                    [462.1312, 161.1004, 495.3576, 217.2208],
+                    [462.5800, 159.9310, 501.2937, 224.1655],
+                    [503.5242, 170.0733, 518.3792, 209.0113],
+                    [250.3658, 195.5925, 260.6523, 212.4679],
+                    [108.8287, 163.6994, 146.3642, 229.7261],
+                    [256.7617, 187.3123, 288.8407, 211.2013],
+                    [161.2781, 167.4801, 186.3751, 232.7133],
+                    [115.3760, 177.5859, 163.3512, 236.9660],
+                    [248.9077, 188.0919, 264.8579, 207.9718],
+                    [108.1349, 160.7851, 143.6370, 229.6243],
+                    [465.0900, 156.7555, 490.3561, 213.5704],
+                    [107.5338, 173.4323, 141.0704, 235.2910],
+                ]
+            ).astype(np.float32)
+        )
+        scores = torch.from_numpy(
+            np.array(
+                [
+                    0.1919,
+                    0.3293,
+                    0.0860,
+                    0.1600,
+                    0.1885,
+                    0.4297,
+                    0.0974,
+                    0.2711,
+                    0.1483,
+                    0.1173,
+                    0.1034,
+                    0.2915,
+                    0.1993,
+                    0.0677,
+                    0.3217,
+                    0.0966,
+                    0.0526,
+                    0.5675,
+                    0.3130,
+                    0.1592,
+                    0.1353,
+                    0.0634,
+                    0.1557,
+                    0.1512,
+                    0.0699,
+                    0.0545,
+                    0.2692,
+                    0.1143,
+                    0.0572,
+                    0.1990,
+                    0.0558,
+                    0.1500,
+                    0.2214,
+                    0.1878,
+                    0.2501,
+                    0.1343,
+                    0.0809,
+                    0.1266,
+                    0.0743,
+                    0.0896,
+                    0.0781,
+                    0.0983,
+                    0.0557,
+                    0.0623,
+                    0.5808,
+                    0.3090,
+                    0.1050,
+                    0.0524,
+                    0.0513,
+                    0.4501,
+                    0.4167,
+                    0.0623,
+                    0.1749,
+                ]
+            ).astype(np.float32)
+        )
+
+        gt_indices = np.array(
+            [
+                1,
+                6,
+                7,
+                8,
+                11,
+                12,
+                13,
+                14,
+                17,
+                18,
+                19,
+                21,
+                23,
+                24,
+                25,
+                26,
+                30,
+                32,
+                33,
+                34,
+                35,
+                37,
+                43,
+                44,
+                47,
+                50,
+            ]
+        )
+        keep_indices = box_nms(boxes, scores, 0.5)
+        keep_indices = np.sort(keep_indices)
+
+        np.testing.assert_array_equal(keep_indices, gt_indices)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_predictors.py
+++ b/tests/test_predictors.py
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+
+import unittest
+import copy
+import torch
+# import modules to to register predictors
+from maskrcnn_benchmark.modeling.backbone import build_backbone # NoQA
+from maskrcnn_benchmark.modeling.roi_heads.roi_heads import build_roi_heads # NoQA
+from maskrcnn_benchmark.modeling import registry
+from maskrcnn_benchmark.config import cfg as g_cfg
+from utils import load_config
+
+
+# overwrite configs if specified, otherwise default config is used
+PREDICTOR_CFGS = {
+}
+
+# overwrite configs if specified, otherwise default config is used
+PREDICTOR_INPUT_CHANNELS = {
+}
+
+
+def _test_predictors(
+    self, predictors, overwrite_cfgs, overwrite_in_channels,
+    hwsize,
+):
+    ''' Make sure predictors run '''
+
+    self.assertGreater(len(predictors), 0)
+
+    in_channels_default = 64
+
+    for name, builder in predictors.items():
+        print('Testing {}...'.format(name))
+        if name in overwrite_cfgs:
+            cfg = load_config(overwrite_cfgs[name])
+        else:
+            # Use default config if config file is not specified
+            cfg = copy.deepcopy(g_cfg)
+
+        in_channels = overwrite_in_channels.get(
+            name, in_channels_default)
+
+        fe = builder(cfg, in_channels)
+
+        N, C_in, H, W = 2, in_channels, hwsize, hwsize
+        input = torch.rand([N, C_in, H, W], dtype=torch.float32)
+        out = fe(input)
+        yield input, out, cfg
+
+
+class TestPredictors(unittest.TestCase):
+    def test_roi_box_predictors(self):
+        ''' Make sure roi box predictors run '''
+        for cur_in, cur_out, cur_cfg in _test_predictors(
+            self,
+            registry.ROI_BOX_PREDICTOR,
+            PREDICTOR_CFGS,
+            PREDICTOR_INPUT_CHANNELS,
+            hwsize=1,
+        ):
+            self.assertEqual(len(cur_out), 2)
+            scores, bbox_deltas = cur_out[0], cur_out[1]
+            self.assertEqual(
+                scores.shape[1], cur_cfg.MODEL.ROI_BOX_HEAD.NUM_CLASSES)
+            self.assertEqual(scores.shape[0], cur_in.shape[0])
+            self.assertEqual(scores.shape[0], bbox_deltas.shape[0])
+            self.assertEqual(scores.shape[1] * 4, bbox_deltas.shape[1])
+
+    def test_roi_keypoints_predictors(self):
+        ''' Make sure roi keypoint predictors run '''
+        for cur_in, cur_out, cur_cfg in _test_predictors(
+            self,
+            registry.ROI_KEYPOINT_PREDICTOR,
+            PREDICTOR_CFGS,
+            PREDICTOR_INPUT_CHANNELS,
+            hwsize=14,
+        ):
+            self.assertEqual(cur_out.shape[0], cur_in.shape[0])
+            self.assertEqual(
+                cur_out.shape[1], cur_cfg.MODEL.ROI_KEYPOINT_HEAD.NUM_CLASSES)
+
+    def test_roi_mask_predictors(self):
+        ''' Make sure roi mask predictors run '''
+        for cur_in, cur_out, cur_cfg in _test_predictors(
+            self,
+            registry.ROI_MASK_PREDICTOR,
+            PREDICTOR_CFGS,
+            PREDICTOR_INPUT_CHANNELS,
+            hwsize=14,
+        ):
+            self.assertEqual(cur_out.shape[0], cur_in.shape[0])
+            self.assertEqual(
+                cur_out.shape[1], cur_cfg.MODEL.ROI_BOX_HEAD.NUM_CLASSES)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_rpn_heads.py
+++ b/tests/test_rpn_heads.py
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+
+import unittest
+import copy
+import torch
+# import modules to to register rpn heads
+from maskrcnn_benchmark.modeling.backbone import build_backbone # NoQA
+from maskrcnn_benchmark.modeling.rpn.rpn import build_rpn # NoQA
+from maskrcnn_benchmark.modeling import registry
+from maskrcnn_benchmark.config import cfg as g_cfg
+from utils import load_config
+
+
+# overwrite configs if specified, otherwise default config is used
+RPN_CFGS = {
+}
+
+
+class TestRPNHeads(unittest.TestCase):
+    def test_build_rpn_heads(self):
+        ''' Make sure rpn heads run '''
+
+        self.assertGreater(len(registry.RPN_HEADS), 0)
+
+        in_channels = 64
+        num_anchors = 10
+
+        for name, builder in registry.RPN_HEADS.items():
+            print('Testing {}...'.format(name))
+            if name in RPN_CFGS:
+                cfg = load_config(RPN_CFGS[name])
+            else:
+                # Use default config if config file is not specified
+                cfg = copy.deepcopy(g_cfg)
+
+            rpn = builder(cfg, in_channels, num_anchors)
+
+            N, C_in, H, W = 2, in_channels, 24, 32
+            input = torch.rand([N, C_in, H, W], dtype=torch.float32)
+            LAYERS = 3
+            out = rpn([input] * LAYERS)
+            self.assertEqual(len(out), 2)
+            logits, bbox_reg = out
+            for idx in range(LAYERS):
+                self.assertEqual(
+                    logits[idx].shape,
+                    torch.Size([
+                        input.shape[0], num_anchors,
+                        input.shape[2], input.shape[3],
+                    ])
+                )
+                self.assertEqual(
+                    bbox_reg[idx].shape,
+                    torch.Size([
+                        logits[idx].shape[0], num_anchors * 4,
+                        logits[idx].shape[2], logits[idx].shape[3],
+                    ]),
+                )
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/utils.py
+++ b/tests/utils.py
+from __future__ import absolute_import, division, print_function, unicode_literals
+
+# Set up custom environment before nearly anything else is imported
+# NOTE: this should be the first import (no not reorder)
+from maskrcnn_benchmark.utils.env import setup_environment  # noqa F401 isort:skip
+import env_tests.env as env_tests
+
+import os
+import copy
+
+from maskrcnn_benchmark.config import cfg as g_cfg
+
+
+def get_config_root_path():
+    return env_tests.get_config_root_path()
+
+
+def load_config(rel_path):
+    ''' Load config from file path specified as path relative to config_root '''
+    cfg_path = os.path.join(env_tests.get_config_root_path(), rel_path)
+    return load_config_from_file(cfg_path)
+
+
+def load_config_from_file(file_path):
+    ''' Load config from file path specified as absolute path '''
+    ret = copy.deepcopy(g_cfg)
+    ret.merge_from_file(file_path)
+    return ret