Commit 2f8161ed authored by Ross Girshick's avatar Ross Girshick Committed by facebook-github-bot

Initial commit

fbshipit-source-id: d6798bb3ead07e6e3da5edebc53b946e6cfa0807
parents
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# Shared objects
*.so
# Distribution / packaging
lib/build/
*.egg-info/
*.egg
# Temporary files
*.swn
*.swo
*.swp
# Dataset symlinks
lib/datasets/data/*
!lib/datasets/data/README.md
# Generated C files
lib/utils/cython_*.c
# Contributing to Detectron
We want to make contributing to this project as easy and transparent as
possible.
## Our Development Process
Minor changes and improvements will be released on an ongoing basis. Larger
changes (e.g., changesets implementing a new paper) will be released on a more
periodic basis.
## Pull Requests
We actively welcome your pull requests.
1. Fork the repo and create your branch from `master`.
2. If you've added code that should be tested, add tests.
3. If you've changed APIs, update the documentation.
4. Ensure the test suite passes.
5. Make sure your code lints.
6. Ensure no regressions in baseline model speed and accuracy.
7. If you haven't already, complete the Contributor License Agreement ("CLA").
## Contributor License Agreement ("CLA")
In order to accept your pull request, we need you to submit a CLA. You only need
to do this once to work on any of Facebook's open source projects.
Complete your CLA here: <https://code.facebook.com/cla>
## Issues
GitHub issues will be largely unattended and are mainly intended as a community
forum for collectively debugging issues, hopefully leading to pull requests with
fixes when appropriate.
## Coding Style
* 4 spaces for indentation rather than tabs
* 80 character line length
* PEP8 formatting
## License
By contributing to Detectron, you agree that your contributions will be licensed
under the LICENSE file in the root directory of this source tree.
# FAQ
This document covers frequently asked questions.
- For general information about Detectron, please see [`README.md`](README.md).
- For installation instructions, please see [`INSTALL.md`](INSTALL.md).
- For a quick getting started guide, please see [`GETTING_STARTED.md`](GETTING_STARTED.md).
#### Q: How do I compute validation AP during training?
**A:** Detectron does not compute validation statistics (e.g., AP) during training because this slows training. Instead, we've implemented a "validation monitor", which is a process that polls for new model checkpoints saved by a training job and when one is found performs inference with it by scheduling a job with `tools/test_net.py` asynchronously using free GPUs in our cluster. We have not released the validation monitor because (1) it's a relatively thin wrapper on top of `tools/train_net.py` and (2) the little code that comprises it is specific to our cluster and would not be generally useful.
#### Q: How do I restrict Detectron to use only a subset of the GPUs on a server?
**A:** Don't modify the code; use the [`CUDA_VISIBLE_DEVICES`](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) environment variable instead.
#### Q: Detection on one image is really slow compared to the reported performance, why?
A: Various algorithms and caches (e.g., from `cudnn`) take some time to warm up. Peak inference performance will not be reached until after a few images have been processed.
Also potentially relevant: inference with Mask R-CNN on high-resolution images may be slow simply because substantial time is spent upsampling the predicted masks to the original image resolution (this has not been optimized). You can diagnose this issue if the `misc_mask` time reported by `tools/infer_simple.py` is high (e.g., much more than 20-90ms). The solution is to first resize your images such that the short side is around 600-800px (the exact choice does not matter) and then run inference on the resized image.
#### Q: How do I implement a custom Caffe2 CPU or GPU operator for use in Detectron?
**A:** Detectron uses a number of specialized Caffe2 operators that are distributed via the [Caffe2 Detectron module](https://github.com/caffe2/caffe2/tree/master/modules/detectron) as part of the core Caffe2 GitHub repository. If you'd like to implement a custom Caffe2 operator for your project, we have written a toy example illustrating how to add an operator under the Detectron source tree; please see [`lib/ops/zero_even_op.*`](lib/ops/) and [`tests/test_zero_even_op.py`](tests/test_zero_even_op.py). For more background on writing Caffe2 operators please consult the [Caffe2 documentation](https://caffe2.ai/docs/custom-operators.html).
#### Q: How do I use Detectron to train a model on a custom dataset?
**A:** If possible, we strongly recommend that you first convert the custom dataset annotation format to the [COCO API json format](http://cocodataset.org/#download). Then, add your dataset to the [dataset catalog](lib/datasets/dataset_catalog.py) so that Detectron can use it for training and inference. If your dataset cannot be converted to the COCO API json format, then it's likely that more significant code modifications will be required. If the dataset you're adding is popular, please consider making the converted annotations publicly available; If code modifications are required, please consider submitting a pull request.
# Using Detectron
This document provides brief tutorials covering Detectron for inference and training on the COCO dataset.
- For general information about Detectron, please see [`README.md`](README.md).
- For installation instructions, please see [`INSTALL.md`](INSTALL.md).
## Inference with Pretrained Models
#### 1. Directory of Image Files
To run inference on a directory of image files (`demo/*.jpg` in this example), you can use the `infer_simple.py` tool. In this example, we're using an end-to-end trained Mask R-CNN model with a ResNet-101-FPN backbone from the model zoo:
```
python2 tools/infer_simple.py \
--cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \
--output-dir /tmp/detectron-visualizations \
--image-ext jpg \
--wts https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \
demo
```
Detectron should automatically download the model from the URL specified by the `--wts` argument. This tool will output visualizations of the detections in PDF format in the directory specified by `--output-dir`. Here's an example of the output you should expect to see (for copyright information about the demo images see [`demo/NOTICE`](demo/NOTICE)).
<div align="center">
<img src="demo/output/17790319373_bd19b24cfc_k_example_output.jpg" width="700px" />
<p>Example Mask R-CNN output.</p>
</div>
**Notes:**
- When running inference on your own high-resolution images, Mask R-CNN may be slow simply because substantial time is spent upsampling the predicted masks to the original image resolution (this has not been optimized). You can diagnose this issue if the `misc_mask` time reported by `tools/infer_simple.py` is high (e.g., much more than 20-90ms). The solution is to first resize your images such that the short side is around 600-800px (the exact choice does not matter) and then run inference on the resized image.
#### 2. COCO Dataset
This example shows how to run an end-to-end trained Mask R-CNN model from the model zoo using a single GPU for inference. As configured, this will run inference on all images in `coco_2014_minival` (which must be properly installed).
```
python2 tools/test_net.py \
--cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \
TEST.WEIGHTS https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \
NUM_GPUS 1
```
Running inference with the same model using `$N` GPUs (e.g., `N=8`).
```
python2 tools/test_net.py \
--cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \
--multi-gpu-testing \
TEST.WEIGHTS https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \
NUM_GPUS $N
```
On an NVIDIA Tesla P100 GPU, inference should take about 130-140 ms per image for this example.
## Training a Model with Detectron
This is a tiny tutorial showing how to train a model on COCO. The model will be an end-to-end trained Faster R-CNN using a ResNet-50-FPN backbone. For the purpose of this tutorial, we'll use a short training schedule and a small input image size so that training and inference will be relatively fast. As a result, the box AP on COCO will be relatively low compared to our [baselines](MODEL_ZOO.md). This example is provided for instructive purposes only (i.e., not for comparing against publications).
#### 1. Training with 1 GPU
```
python2 tools/train_net.py \
--cfg configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml \
OUTPUT_DIR /tmp/detectron-output
```
**Expected results:**
- Output (models, validation set detections, etc.) will be saved under `/tmp/detectron-output`
- On a Maxwell generation GPU (e.g., M40), training should take around 4.2 hours
- Inference time should be around 80ms / image (also on an M40)
- Box AP on `coco_2014_minival` should be around 22.1% (+/- 0.1% stdev measured over 3 runs)
### 2. Multi-GPU Training
We've also provided configs to illustrate training with 2, 4, and 8 GPUs using learning schedules that will be approximately equivalent to the one used with 1 GPU above. The configs are located at: `configs/getting_started/tutorial_{2,4,8}gpu_e2e_faster_rcnn_R-50-FPN.yaml`. For example, launching a training job with 2 GPUs will look like this:
```
python2 tools/train_net.py \
--multi-gpu-testing \
--cfg configs/getting_started/tutorial_2gpu_e2e_faster_rcnn_R-50-FPN.yaml \
OUTPUT_DIR /tmp/detectron-output
```
Note that we've also added the `--multi-gpu-testing` flag to instruct Detectron to parallelize inference over multiple GPUs (2 in this example; see `NUM_GPUS` in the config file) after training has finished.
**Expected results:**
- Training should take around 2.3 hours (2 x M40)
- Inference time should be around 80ms / image (but in parallel on 2 GPUs, so half the total time)
- Box AP on `coco_2014_minival` should be around 22.1% (+/- 0.1% stdev measured over 3 runs)
To understand how learning schedules are adjusted (the "linear scaling rule"), please study these tutorial config files and read our paper [Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677). **Aside from this tutorial, all of our released configs make use of 8 GPUs. If you will be using fewer than 8 GPUs for training (or do anything else that changes the minibatch size), it is essential that you understand how to manipulate training schedules according to the linear scaling rule.**
**Notes:**
- This training example uses a relatively low GPU-compute model and thus overhead from Caffe2 Python ops is relatively high. As a result, scaling as the number of GPUs is increased from 2 to 8 is relatively poor (e.g., training with 8 GPUs takes about 0.9 hours, only 4.5x faster than with 1 GPU). As larger, more GPU-compute heavy models are used, the scaling improves.
# Installing Detectron
This document covers how to install Detectron, its dependencies (including Caffe2), and the COCO dataset.
- For general information about Detectron, please see [`README.md`](README.md).
**Requirements:**
- NVIDIA GPU, Linux, Python2
- Caffe2, various standard Python packages, and the COCO API; Instructions for installing these dependencies are found below
**Notes:**
- Detectron operators currently do not have CPU implementation; a GPU system is required.
- Detectron has been tested extensively with CUDA 8.0 and cuDNN 6.0.21.
## Caffe2
To install Caffe2 with CUDA support, follow the [installation instructions](https://caffe2.ai/docs/getting-started.html) from the [Caffe2 website](https://caffe2.ai/). **If you already have Caffe2 installed, make sure to update your Caffe2 to a version that includes the [Detectron module](https://github.com/caffe2/caffe2/tree/master/modules/detectron).**
Please ensure that your Caffe2 installation was successful before proceeding by running the following commands and checking their output as directed in the comments.
```
# To check if Caffe2 build was successful
python2 -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"
# To check if Caffe2 GPU build was successful
# This must print a number > 0 in order to use Detectron
python2 -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices())'
```
If the `caffe2` Python package is not found, you likely need to adjust your `PYTHONPATH` environment variable to include its location (`/path/to/caffe2/build`, where `build` is the Caffe2 CMake build directory).
## Other Dependencies
Install Python dependencies:
```
pip install numpy pyyaml matplotlib opencv-python>=3.0 setuptools Cython mock
```
Install the [COCO API](https://github.com/cocodataset/cocoapi):
```
# COCOAPI=/path/to/clone/cocoapi
git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
cd $COCOAPI/PythonAPI
# Install into global site-packages
make install
# Alternatively, if you do not have permissions or prefer
# not to install the COCO API into global site-packages
python2 setup.py install --user
```
Note that instructions like `# COCOAPI=/path/to/install/cocoapi` indicate that you should pick a path where you'd like to have the software cloned and then set an environment variable (`COCOAPI` in this case) accordingly.
## Detectron
Clone the Detectron repository:
```
# DETECTRON=/path/to/clone/detectron
git clone https://github.com/facebookresearch/detectron $DETECTRON
```
Set up Python modules:
```
cd $DETECTRON/lib && make
```
Check that Detectron tests pass (e.g. for [`SpatialNarrowAsOp test`](tests/test_spatial_narrow_as_op.py)):
```
python2 $DETECTRON/tests/test_spatial_narrow_as_op.py
```
## That's All You Need for Inference
At this point, you can run inference using pretrained Detectron models. Take a look at our [inference tutorial](GETTING_STARTED.md) for an example. If you want to train models on the COCO dataset, then please continue with the installation instructions.
## Datasets
Detectron finds datasets via symlinks from `lib/datasets/data` to the actual locations where the dataset images and annotations are stored. For instructions on how to create symlinks for COCO and other datasets, please see [`lib/datasets/data/README.md`](lib/datasets/data/README.md).
After symlinks have been created, that's all you need to start training models.
## Advanced Topic: Custom Operators for New Research Projects
Please read the custom operators section of the [`FAQ`](FAQ.md) first.
For convenience, we provide CMake support for building custom operators. All custom operators are built into a single library that can be loaded dynamically from Python.
Place your custom operator implementation under [`lib/ops/`](lib/ops/) and see [`tests/test_zero_even_op.py`](tests/test_zero_even_op.py) for an example of how to load custom operators from Python.
Build the custom operators library:
```
cd $DETECTRON/lib && make ops
```
Check that the custom operator tests pass:
```
python2 $DETECTRON/tests/test_zero_even_op.py
```
## Docker Image
We provide a [`Dockerfile`](docker/Dockerfile) that you can use to build a Detectron image on top of a Caffe2 image that satisfies the requirements outlined at the top. If you're using a prebuilt Caffe2 image (e.g. from the [docker repo](https://hub.docker.com/r/caffe2ai/caffe2/)), please make sure that it includes the [Detectron module](https://github.com/caffe2/caffe2/tree/master/modules/detectron). We also provide an example of how to build an up-to-date Caffe2 image.
Build a Caffe2 image:
```
cd /path/to/caffe2/docker/ubuntu-16.04-cuda8-cudnn6-all-options
# Use the latest Caffe2 master
sed -i -e 's/ --branch v0.8.1//g' Dockerfile
docker build -t caffe2:cuda8-cudnn6-all-options .
```
Build a Detectron image:
```
cd $DETECTRON/docker
docker build -t detectron:c2-cuda8-cudnn6 .
```
Run the Detectron image (e.g. for [`BatchPermutationOp test`](tests/test_batch_permutation_op.py)):
```
nvidia-docker run --rm -it detectron:c2-cuda8-cudnn6 python2 tests/test_batch_permutation_op.py
```
## Troubleshooting
In case of Caffe2 installation problems, please read the troubleshooting section of the relevant Caffe2 [installation instructions](https://caffe2.ai/docs/getting-started.html) first. In the following, we provide additional troubleshooting tips for Caffe2 and Detectron.
### Caffe2 Operator Profiling
Caffe2 comes with performance [`profiling`](https://github.com/caffe2/caffe2/tree/master/caffe2/contrib/prof)
support which you may find useful for benchmarking or debugging your operators
(see [`BatchPermutationOp test`](tests/test_batch_permutation_op.py) for example usage).
Profiling support is not built by default and you can enable it by setting
the `-DUSE_PROF=ON` flag when running Caffe2 CMake.
### CMake Cannot Find CUDA and cuDNN
Sometimes CMake has trouble with finding CUDA and cuDNN dirs on your machine.
When building Caffe2, you can point CMake to CUDA and cuDNN dirs by running:
```
cmake .. \
# insert your Caffe2 CMake flags here
-DCUDA_TOOLKIT_ROOT_DIR=/path/to/cuda/toolkit/dir \
-DCUDNN_ROOT_DIR=/path/to/cudnn/root/dir
```
Similarly, when building custom Detectron operators you can use:
```
cd $DETECTRON/lib
mkdir -p build && cd build
cmake .. \
-DCUDA_TOOLKIT_ROOT_DIR=/path/to/cuda/toolkit/dir \
-DCUDNN_ROOT_DIR=/path/to/cudnn/root/dir
make
```
Note that you can use the same commands to get CMake to use specific versions of CUDA and cuDNN out of possibly multiple versions installed on your machine.
### Protobuf Errors
Caffe2 uses protobuf as its serialization format and requires version `3.2.0` or newer.
If your protobuf version is older, you can build protobuf from Caffe2 protobuf submodule and use that version instead.
To build Caffe2 protobuf submodule:
```
# CAFFE2=/path/to/caffe2
cd $CAFFE2/third_party/protobuf/cmake
mkdir -p build && cd build
cmake .. \
-DCMAKE_INSTALL_PREFIX=$HOME/c2_tp_protobuf \
-Dprotobuf_BUILD_TESTS=OFF \
-DCMAKE_CXX_FLAGS="-fPIC"
make install
```
To point Caffe2 CMake to the newly built protobuf:
```
cmake .. \
# insert your Caffe2 CMake flags here
-DPROTOBUF_PROTOC_EXECUTABLE=$HOME/c2_tp_protobuf/bin/protoc \
-DPROTOBUF_INCLUDE_DIR=$HOME/c2_tp_protobuf/include \
-DPROTOBUF_LIBRARY=$HOME/c2_tp_protobuf/lib64/libprotobuf.a
```
You may also experience problems with protobuf if you have both system and anaconda packages installed.
This could lead to problems as the versions could be mixed at compile time or at runtime.
This issue can also be overcome by following the commands from above.
### Caffe2 Python Binaries
In case you experience issues with CMake being unable to find the required Python paths when
building Caffe2 Python binaries (e.g. in virtualenv), you can try pointing Caffe2 CMake to python
library and include dir by using:
```
cmake .. \
# insert your Caffe2 CMake flags here
-DPYTHON_LIBRARY=$(python2 -c "from distutils import sysconfig; print(sysconfig.get_python_lib())") \
-DPYTHON_INCLUDE_DIR=$(python2 -c "from distutils import sysconfig; print(sysconfig.get_python_inc())")
```
### Caffe2 with NNPACK Build
Detectron does not require Caffe2 built with NNPACK support. If you face NNPACK related issues during Caffe2 installation, you can safely disable NNPACK by setting the `-DUSE_NNPACK=OFF` CMake flag.
### Caffe2 with OpenCV Build
Analogously to the NNPACK case above, you can disable OpenCV by setting the `-DUSE_OPENCV=OFF` CMake flag.
### COCO API Undefined Symbol Error
If you encounter a COCO API import error due to an undefined symbol, as reported [here](https://github.com/cocodataset/cocoapi/issues/35),
make sure that your python versions are not getting mixed. For instance, this issue may arise if you have
[both system and conda numpy installed](https://stackoverflow.com/questions/36190757/numpy-undefined-symbol-pyfpe-jbuf).
### CMake Cannot Find Caffe2
In case you experience issues with CMake being unable to find the Caffe2 package when building custom operators,
make sure you have run `make install` as part of your Caffe2 installation process.
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
This source diff could not be displayed because it is too large. You can view the blob instead.
Portions of this software are derived from py-faster-rcnn.
==============================================================================
py-faster-rcnn licence
==============================================================================
Faster R-CNN
The MIT License (MIT)
Copyright (c) 2015 Microsoft Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
# Detectron
Detectron is Facebook AI Research's software system that implements state-of-the-art object detection algorithms, including [Mask R-CNN](https://arxiv.org/abs/1703.06870). It is written in Python and powered by the [Caffe2](https://github.com/caffe2/caffe2) deep learning framework.
At FAIR, Detectron has enabled numerous research projects, including: [Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144), [Mask R-CNN](https://arxiv.org/abs/1703.06870), [Detecting and Recognizing Human-Object Interactions](https://arxiv.org/abs/1704.07333), [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002), [Non-local Neural Networks](https://arxiv.org/abs/1711.07971), [Learning to Segment Every Thing](https://arxiv.org/abs/1711.10370), and [Data Distillation: Towards Omni-Supervised Learning](https://arxiv.org/abs/1712.04440).
<div align="center">
<img src="demo/output/33823288584_1d21cf0a26_k_example_output.jpg" width="700px" />
<p>Example Mask R-CNN output.</p>
</div>
## Introduction
The goal of Detectron is to provide a high-quality, high-performance
codebase for object detection *research*. It is designed to be flexible in order
to support rapid implementation and evaluation of novel research. Detectron
includes implementations of the following object detection algorithms:
- [Mask R-CNN](https://arxiv.org/abs/1703.06870) -- *Marr Prize at ICCV 2017*
- [RetinaNet](https://arxiv.org/abs/1708.02002) -- *Best Student Paper Award at ICCV 2017*
- [Faster R-CNN](https://arxiv.org/abs/1506.01497)
- [RPN](https://arxiv.org/abs/1506.01497)
- [Fast R-CNN](https://arxiv.org/abs/1504.08083)
- [R-FCN](https://arxiv.org/abs/1605.06409)
using the following backbone network architectures:
- [ResNeXt{50,101,152}](https://arxiv.org/abs/1611.05431)
- [ResNet{50,101,152}](https://arxiv.org/abs/1512.03385)
- [Feature Pyramid Networks](https://arxiv.org/abs/1612.03144) (with ResNet/ResNeXt)
- [VGG16](https://arxiv.org/abs/1409.1556)
Additional backbone architectures may be easily implemented. For more details about these models, please see [References](#references) below.
## License
Detectron is released under the [Apache 2.0 license](https://github.com/facebookresearch/detectron/blob/master/LICENSE). See the [NOTICE](https://github.com/facebookresearch/detectron/blob/master/NOTICE) file for additional details.
## Citing Detectron
If you use Detectron in your research or wish to refer to the baseline results published in the [Model Zoo](MODEL_ZOO.md), please use the following BibTeX entry.
```
@misc{Detectron2018,
author = {Ross Girshick and Ilija Radosavovic and Georgia Gkioxari and
Piotr Doll\'{a}r and Kaiming He},
title = {Detectron},
howpublished = {\url{https://github.com/facebookresearch/detectron}},
year = {2018}
}
```
## Model Zoo and Baselines
We provide a large set of baseline results and trained models available for download in the [Detectron Model Zoo](MODEL_ZOO.md).
## Installation
Please find installation instructions for Caffe2 and Detectron in [`INSTALL.md`](INSTALL.md).
## Quick Start: Using Detectron
After installation, please see [`GETTING_STARTED.md`](GETTING_STARTED.md) for brief tutorials covering inference and training with Detectron.
## Getting Help
To start, please check the [troubleshooting](INSTALL.md#troubleshooting) section of our installation instructions as well as our [FAQ](FAQ.md). If you couldn't find help there, try searching our GitHub issues. We intend the issues page to be a forum in which the community collectively troubleshoots problems.
If bugs are found, **we appreciate pull requests** (including adding Q&A's to `FAQ.md` and improving our installation instructions and troubleshooting documents). Please see [CONTRIBUTING.md](CONTRIBUTING.md) for more information about contributing to Detectron.
## References
- [Data Distillation: Towards Omni-Supervised Learning](https://arxiv.org/abs/1712.04440).
Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, and Kaiming He.
Tech report, arXiv, Dec. 2017.
- [Learning to Segment Every Thing](https://arxiv.org/abs/1711.10370).
Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, and Ross Girshick.
Tech report, arXiv, Nov. 2017.
- [Non-Local Neural Networks](https://arxiv.org/abs/1711.07971).
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He.
Tech report, arXiv, Nov. 2017.
- [Mask R-CNN](https://arxiv.org/abs/1703.06870).
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick.
IEEE International Conference on Computer Vision (ICCV), 2017.
- [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002).
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár.
IEEE International Conference on Computer Vision (ICCV), 2017.
- [Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677).
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He.
Tech report, arXiv, June 2017.
- [Detecting and Recognizing Human-Object Interactions](https://arxiv.org/abs/1704.07333).
Georgia Gkioxari, Ross Girshick, Piotr Dollár, and Kaiming He.
Tech report, arXiv, Apr. 2017.
- [Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144).
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431).
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- [R-FCN: Object Detection via Region-based Fully Convolutional Networks](http://arxiv.org/abs/1605.06409).
Jifeng Dai, Yi Li, Kaiming He, and Jian Sun.
Conference on Neural Information Processing Systems (NIPS), 2016.
- [Deep Residual Learning for Image Recognition](http://arxiv.org/abs/1512.03385).
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](http://arxiv.org/abs/1506.01497)
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun.
Conference on Neural Information Processing Systems (NIPS), 2015.
- [Fast R-CNN](http://arxiv.org/abs/1504.08083).
Ross Girshick.
IEEE International Conference on Computer Vision (ICCV), 2015.
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: ResNet.add_ResNet50_conv4_body
NUM_CLASSES: 81
FASTER_RCNN: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
# 1x schedule (note TRAIN.IMS_PER_BATCH: 1)
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
RPN:
SIZES: (32, 64, 128, 256, 512)
FAST_RCNN:
ROI_BOX_HEAD: ResNet.add_ResNet_roi_conv5_head
ROI_XFORM_METHOD: RoIAlign
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 6000
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: ResNet.add_ResNet50_conv4_body
NUM_CLASSES: 81
FASTER_RCNN: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
# 2x schedule (note TRAIN.IMS_PER_BATCH: 1)
MAX_ITER: 360000
STEPS: [0, 240000, 320000]
RPN:
SIZES: (32, 64, 128, 256, 512)
FAST_RCNN:
ROI_BOX_HEAD: ResNet.add_ResNet_roi_conv5_head
ROI_XFORM_METHOD: RoIAlign
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 6000
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
# 1x schedule (note TRAIN.IMS_PER_BATCH: 1)
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/20171220/X-101-32x8d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
# 2x schedule (note TRAIN.IMS_PER_BATCH: 1)
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 360000
STEPS: [0, 240000, 320000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/20171220/X-101-32x8d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
# 1x schedule (note TRAIN.IMS_PER_BATCH: 1)
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 64
WIDTH_PER_GROUP: 4
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
# 2x schedule (note TRAIN.IMS_PER_BATCH: 1)
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 360000
STEPS: [0, 240000, 320000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 64
WIDTH_PER_GROUP: 4
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
# md5sum of weights pkl file: aa14062280226e48f569ef1c7212e7c7
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 2
FASTER_RCNN: True
KEYPOINTS_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: head_builder.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
KRCNN:
ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX
NUM_STACKED_CONVS: 8
NUM_KEYPOINTS: 17
USE_DECONV_OUTPUT: True
CONV_INIT: MSRAFill
CONV_HEAD_DIM: 512
UP_SCALE: 2
HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) * UP_SCALE (2) * USE_DECONV_OUTPUT (2)
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
ROI_XFORM_SAMPLING_RATIO: 2
KEYPOINT_CONFIDENCE: bbox
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
SCALES: (640, 672, 704, 736, 768, 800)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('keypoints_coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 2
FASTER_RCNN: True
KEYPOINTS_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 130000
STEPS: [0, 100000, 120000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: head_builder.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
KRCNN:
ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX
NUM_STACKED_CONVS: 8
NUM_KEYPOINTS: 17
USE_DECONV_OUTPUT: True
CONV_INIT: MSRAFill
CONV_HEAD_DIM: 512
UP_SCALE: 2
HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) * UP_SCALE (2) * USE_DECONV_OUTPUT (2)
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
ROI_XFORM_SAMPLING_RATIO: 2
KEYPOINT_CONFIDENCE: bbox
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
SCALES: (640, 672, 704, 736, 768, 800)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('keypoints_coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 2
FASTER_RCNN: True
KEYPOINTS_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: head_builder.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
KRCNN:
ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX
NUM_STACKED_CONVS: 8
NUM_KEYPOINTS: 17
USE_DECONV_OUTPUT: True
CONV_INIT: MSRAFill
CONV_HEAD_DIM: 512
UP_SCALE: 2
HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) * UP_SCALE (2) * USE_DECONV_OUTPUT (2)
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
ROI_XFORM_SAMPLING_RATIO: 2
KEYPOINT_CONFIDENCE: bbox
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
SCALES: (640, 672, 704, 736, 768, 800)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('keypoints_coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 2
FASTER_RCNN: True
KEYPOINTS_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 130000
STEPS: [0, 100000, 120000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: head_builder.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
KRCNN:
ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX
NUM_STACKED_CONVS: 8
NUM_KEYPOINTS: 17
USE_DECONV_OUTPUT: True
CONV_INIT: MSRAFill
CONV_HEAD_DIM: 512
UP_SCALE: 2
HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) * UP_SCALE (2) * USE_DECONV_OUTPUT (2)
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
ROI_XFORM_SAMPLING_RATIO: 2
KEYPOINT_CONFIDENCE: bbox
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
SCALES: (640, 672, 704, 736, 768, 800)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('keypoints_coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 2
FASTER_RCNN: True
KEYPOINTS_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
FAST_RCNN:
ROI_BOX_HEAD: head_builder.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
KRCNN:
ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX
NUM_STACKED_CONVS: 8
NUM_KEYPOINTS: 17
USE_DECONV_OUTPUT: True
CONV_INIT: MSRAFill
CONV_HEAD_DIM: 512
UP_SCALE: 2
HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) * UP_SCALE (2) * USE_DECONV_OUTPUT (2)
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
ROI_XFORM_SAMPLING_RATIO: 2
KEYPOINT_CONFIDENCE: bbox
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/20171220/X-101-32x8d.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
SCALES: (640, 672, 704, 736, 768, 800)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('keypoints_coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 2
FASTER_RCNN: True
KEYPOINTS_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 130000
STEPS: [0, 100000, 120000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
FAST_RCNN:
ROI_BOX_HEAD: head_builder.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
KRCNN:
ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX
NUM_STACKED_CONVS: 8
NUM_KEYPOINTS: 17
USE_DECONV_OUTPUT: True
CONV_INIT: MSRAFill
CONV_HEAD_DIM: 512
UP_SCALE: 2
HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) * UP_SCALE (2) * USE_DECONV_OUTPUT (2)
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
ROI_XFORM_SAMPLING_RATIO: 2
KEYPOINT_CONFIDENCE: bbox
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/20171220/X-101-32x8d.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
SCALES: (640, 672, 704, 736, 768, 800)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('keypoints_coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 2
FASTER_RCNN: True
KEYPOINTS_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 64
WIDTH_PER_GROUP: 4
FAST_RCNN:
ROI_BOX_HEAD: head_builder.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
KRCNN:
ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX
NUM_STACKED_CONVS: 8
NUM_KEYPOINTS: 17
USE_DECONV_OUTPUT: True
CONV_INIT: MSRAFill
CONV_HEAD_DIM: 512
UP_SCALE: 2
HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) * UP_SCALE (2) * USE_DECONV_OUTPUT (2)
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
ROI_XFORM_SAMPLING_RATIO: 2
KEYPOINT_CONFIDENCE: bbox
TRAIN:
# md5sum of weights pkl file: aa14062280226e48f569ef1c7212e7c7
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
SCALES: (640, 672, 704, 736, 768, 800)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('keypoints_coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 2
FASTER_RCNN: True
KEYPOINTS_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 130000
STEPS: [0, 100000, 120000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 64
WIDTH_PER_GROUP: 4
FAST_RCNN:
ROI_BOX_HEAD: head_builder.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
KRCNN:
ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX
NUM_STACKED_CONVS: 8
NUM_KEYPOINTS: 17
USE_DECONV_OUTPUT: True
CONV_INIT: MSRAFill
CONV_HEAD_DIM: 512
UP_SCALE: 2
HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) * UP_SCALE (2) * USE_DECONV_OUTPUT (2)
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
ROI_XFORM_SAMPLING_RATIO: 2
KEYPOINT_CONFIDENCE: bbox
TRAIN:
# md5sum of weights pkl file: aa14062280226e48f569ef1c7212e7c7
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
SCALES: (640, 672, 704, 736, 768, 800)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('keypoints_coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: ResNet.add_ResNet50_conv4_body
NUM_CLASSES: 81
FASTER_RCNN: True
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
# 1x schedule (note TRAIN.IMS_PER_BATCH: 1)
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
RPN:
SIZES: (32, 64, 128, 256, 512)
FAST_RCNN:
ROI_BOX_HEAD: ResNet.add_ResNet_roi_conv5_head
ROI_XFORM_METHOD: RoIAlign
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v0upshare
RESOLUTION: 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default: GaussianFill
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 6000
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: ResNet.add_ResNet50_conv4_body
NUM_CLASSES: 81
FASTER_RCNN: True
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
# 2x schedule (note TRAIN.IMS_PER_BATCH: 1)
MAX_ITER: 360000
STEPS: [0, 240000, 320000]
RPN:
SIZES: (32, 64, 128, 256, 512)
FAST_RCNN:
ROI_BOX_HEAD: ResNet.add_ResNet_roi_conv5_head
ROI_XFORM_METHOD: RoIAlign
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v0upshare
RESOLUTION: 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default: GaussianFill
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 6000
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
# 1x schedule (note TRAIN.IMS_PER_BATCH: 1)
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/20171220/X-101-32x8d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
# 2x schedule (note TRAIN.IMS_PER_BATCH: 1)
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 360000
STEPS: [0, 240000, 320000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/20171220/X-101-32x8d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
# 1x schedule (note TRAIN.IMS_PER_BATCH: 1)
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 64
WIDTH_PER_GROUP: 4
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
# md5sum of weights pkl file: aa14062280226e48f569ef1c7212e7c7
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
# 2x schedule (note TRAIN.IMS_PER_BATCH: 1)
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 360000
STEPS: [0, 240000, 320000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 64
WIDTH_PER_GROUP: 4
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
# md5sum of weights pkl file: aa14062280226e48f569ef1c7212e7c7
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet152_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
# 1.44x schedule (note TRAIN.IMS_PER_BATCH: 1)
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 260000
STEPS: [0, 200000, 240000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/25093814/X-152-32x8d-IN5k.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (640, 672, 704, 736, 768, 800) # Scale jitter
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
BBOX_VOTE:
ENABLED: True
VOTE_TH: 0.9
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
BBOX_AUG:
ENABLED: True
SCORE_HEUR: UNION
COORD_HEUR: UNION
H_FLIP: True
SCALES: (400, 500, 600, 700, 900, 1000, 1100, 1200)
MAX_SIZE: 2000
SCALE_H_FLIP: True
SCALE_SIZE_DEP: False
ASPECT_RATIOS: ()
ASPECT_RATIO_H_FLIP: False
MASK_AUG:
ENABLED: True
HEUR: SOFT_AVG
H_FLIP: True
SCALES: (400, 500, 600, 700, 900, 1000, 1100, 1200)
MAX_SIZE: 2000
SCALE_H_FLIP: True
SCALE_SIZE_DEP: False
ASPECT_RATIOS: ()
ASPECT_RATIO_H_FLIP: False
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998887/12_2017_baselines/rpn_R-101-FPN_1x.yaml.08_07_07.vzhHEs0V/output/test/coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998887/12_2017_baselines/rpn_R-101-FPN_1x.yaml.08_07_07.vzhHEs0V/output/test/coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998887/12_2017_baselines/rpn_R-101-FPN_1x.yaml.08_07_07.vzhHEs0V/output/test/coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998887/12_2017_baselines/rpn_R-101-FPN_1x.yaml.08_07_07.vzhHEs0V/output/test/coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998887/12_2017_baselines/rpn_R-101-FPN_1x.yaml.08_07_07.vzhHEs0V/output/test/coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998887/12_2017_baselines/rpn_R-101-FPN_1x.yaml.08_07_07.vzhHEs0V/output/test/coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: ResNet.add_ResNet50_conv4_body
NUM_CLASSES: 81
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
# 1x schedule (note TRAIN.IMS_PER_BATCH: 1)
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
RPN:
SIZES: (32, 64, 128, 256, 512)
FAST_RCNN:
ROI_BOX_HEAD: ResNet.add_ResNet_roi_conv5_head
ROI_XFORM_METHOD: RoIAlign
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998355/12_2017_baselines/rpn_R-50-C4_1x.yaml.08_00_43.njH5oD9L/output/test/coco_2014_train/rpn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998355/12_2017_baselines/rpn_R-50-C4_1x.yaml.08_00_43.njH5oD9L/output/test/coco_2014_valminusminival/rpn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998355/12_2017_baselines/rpn_R-50-C4_1x.yaml.08_00_43.njH5oD9L/output/test/coco_2014_minival/rpn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: ResNet.add_ResNet50_conv4_body
NUM_CLASSES: 81
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
# 2x schedule (note TRAIN.IMS_PER_BATCH: 1)
MAX_ITER: 360000
STEPS: [0, 240000, 320000]
RPN:
SIZES: (32, 64, 128, 256, 512)
FAST_RCNN:
ROI_BOX_HEAD: ResNet.add_ResNet_roi_conv5_head
ROI_XFORM_METHOD: RoIAlign
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998355/12_2017_baselines/rpn_R-50-C4_1x.yaml.08_00_43.njH5oD9L/output/test/coco_2014_train/rpn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998355/12_2017_baselines/rpn_R-50-C4_1x.yaml.08_00_43.njH5oD9L/output/test/coco_2014_valminusminival/rpn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998355/12_2017_baselines/rpn_R-50-C4_1x.yaml.08_00_43.njH5oD9L/output/test/coco_2014_minival/rpn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 81
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998814/12_2017_baselines/rpn_R-50-FPN_1x.yaml.08_06_03.Axg0r179/output/test/coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998814/12_2017_baselines/rpn_R-50-FPN_1x.yaml.08_06_03.Axg0r179/output/test/coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998814/12_2017_baselines/rpn_R-50-FPN_1x.yaml.08_06_03.Axg0r179/output/test/coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 81
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998814/12_2017_baselines/rpn_R-50-FPN_1x.yaml.08_06_03.Axg0r179/output/test/coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998814/12_2017_baselines/rpn_R-50-FPN_1x.yaml.08_06_03.Axg0r179/output/test/coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998814/12_2017_baselines/rpn_R-50-FPN_1x.yaml.08_06_03.Axg0r179/output/test/coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
# 1x schedule (note TRAIN.IMS_PER_BATCH: 1)
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/20171220/X-101-32x8d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/36760102/12_2017_baselines/rpn_X-101-32x8d-FPN_1x.yaml.06_00_16.RWeBAniO/output/test/coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/36760102/12_2017_baselines/rpn_X-101-32x8d-FPN_1x.yaml.06_00_16.RWeBAniO/output/test/coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/36760102/12_2017_baselines/rpn_X-101-32x8d-FPN_1x.yaml.06_00_16.RWeBAniO/output/test/coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
# 2x schedule (note TRAIN.IMS_PER_BATCH: 1)
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 360000
STEPS: [0, 240000, 320000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/20171220/X-101-32x8d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/36760102/12_2017_baselines/rpn_X-101-32x8d-FPN_1x.yaml.06_00_16.RWeBAniO/output/test/coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/36760102/12_2017_baselines/rpn_X-101-32x8d-FPN_1x.yaml.06_00_16.RWeBAniO/output/test/coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/36760102/12_2017_baselines/rpn_X-101-32x8d-FPN_1x.yaml.06_00_16.RWeBAniO/output/test/coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
# 1x schedule (note TRAIN.IMS_PER_BATCH: 1)
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 64
WIDTH_PER_GROUP: 4
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998956/12_2017_baselines/rpn_X-101-64x4d-FPN_1x.yaml.08_08_41.Seh0psKz/output/test/coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998956/12_2017_baselines/rpn_X-101-64x4d-FPN_1x.yaml.08_08_41.Seh0psKz/output/test/coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998956/12_2017_baselines/rpn_X-101-64x4d-FPN_1x.yaml.08_08_41.Seh0psKz/output/test/coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
# 2x schedule (note TRAIN.IMS_PER_BATCH: 1)
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 360000
STEPS: [0, 240000, 320000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 64
WIDTH_PER_GROUP: 4
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998956/12_2017_baselines/rpn_X-101-64x4d-FPN_1x.yaml.08_08_41.Seh0psKz/output/test/coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998956/12_2017_baselines/rpn_X-101-64x4d-FPN_1x.yaml.08_08_41.Seh0psKz/output/test/coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998956/12_2017_baselines/rpn_X-101-64x4d-FPN_1x.yaml.08_08_41.Seh0psKz/output/test/coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 2
KEYPOINTS_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: head_builder.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
KRCNN:
ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX
NUM_STACKED_CONVS: 8
NUM_KEYPOINTS: 17
USE_DECONV_OUTPUT: True
CONV_INIT: MSRAFill
CONV_HEAD_DIM: 512
UP_SCALE: 2
HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) * UP_SCALE (2) * USE_DECONV_OUTPUT (2)
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
ROI_XFORM_SAMPLING_RATIO: 2
KEYPOINT_CONFIDENCE: bbox
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35999521/12_2017_baselines/rpn_person_only_R-101-FPN_1x.yaml.08_20_33.1OkqMmqP/output/test/keypoints_coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35999521/12_2017_baselines/rpn_person_only_R-101-FPN_1x.yaml.08_20_33.1OkqMmqP/output/test/keypoints_coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (640, 672, 704, 736, 768, 800)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('keypoints_coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35999521/12_2017_baselines/rpn_person_only_R-101-FPN_1x.yaml.08_20_33.1OkqMmqP/output/test/keypoints_coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 2
KEYPOINTS_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 130000
STEPS: [0, 100000, 120000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: head_builder.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
KRCNN:
ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX
NUM_STACKED_CONVS: 8
NUM_KEYPOINTS: 17
USE_DECONV_OUTPUT: True
CONV_INIT: MSRAFill
CONV_HEAD_DIM: 512
UP_SCALE: 2
HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) * UP_SCALE (2) * USE_DECONV_OUTPUT (2)
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
ROI_XFORM_SAMPLING_RATIO: 2
KEYPOINT_CONFIDENCE: bbox
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35999521/12_2017_baselines/rpn_person_only_R-101-FPN_1x.yaml.08_20_33.1OkqMmqP/output/test/keypoints_coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35999521/12_2017_baselines/rpn_person_only_R-101-FPN_1x.yaml.08_20_33.1OkqMmqP/output/test/keypoints_coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (640, 672, 704, 736, 768, 800)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('keypoints_coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35999521/12_2017_baselines/rpn_person_only_R-101-FPN_1x.yaml.08_20_33.1OkqMmqP/output/test/keypoints_coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 2
KEYPOINTS_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: head_builder.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
KRCNN:
ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX
NUM_STACKED_CONVS: 8
NUM_KEYPOINTS: 17
USE_DECONV_OUTPUT: True
CONV_INIT: MSRAFill
CONV_HEAD_DIM: 512
UP_SCALE: 2
HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) * UP_SCALE (2) * USE_DECONV_OUTPUT (2)
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
ROI_XFORM_SAMPLING_RATIO: 2
KEYPOINT_CONFIDENCE: bbox
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998996/12_2017_baselines/rpn_person_only_R-50-FPN_1x.yaml.08_10_08.0ZWmJm6F/output/test/keypoints_coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998996/12_2017_baselines/rpn_person_only_R-50-FPN_1x.yaml.08_10_08.0ZWmJm6F/output/test/keypoints_coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (640, 672, 704, 736, 768, 800)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('keypoints_coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998996/12_2017_baselines/rpn_person_only_R-50-FPN_1x.yaml.08_10_08.0ZWmJm6F/output/test/keypoints_coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 2
KEYPOINTS_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 130000
STEPS: [0, 100000, 120000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: head_builder.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
KRCNN:
ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX
NUM_STACKED_CONVS: 8
NUM_KEYPOINTS: 17
USE_DECONV_OUTPUT: True
CONV_INIT: MSRAFill
CONV_HEAD_DIM: 512
UP_SCALE: 2
HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) * UP_SCALE (2) * USE_DECONV_OUTPUT (2)
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
ROI_XFORM_SAMPLING_RATIO: 2
KEYPOINT_CONFIDENCE: bbox
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998996/12_2017_baselines/rpn_person_only_R-50-FPN_1x.yaml.08_10_08.0ZWmJm6F/output/test/keypoints_coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998996/12_2017_baselines/rpn_person_only_R-50-FPN_1x.yaml.08_10_08.0ZWmJm6F/output/test/keypoints_coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (640, 672, 704, 736, 768, 800)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('keypoints_coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998996/12_2017_baselines/rpn_person_only_R-50-FPN_1x.yaml.08_10_08.0ZWmJm6F/output/test/keypoints_coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 2
KEYPOINTS_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
FAST_RCNN:
ROI_BOX_HEAD: head_builder.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
KRCNN:
ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX
NUM_STACKED_CONVS: 8
NUM_KEYPOINTS: 17
USE_DECONV_OUTPUT: True
CONV_INIT: MSRAFill
CONV_HEAD_DIM: 512
UP_SCALE: 2
HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) * UP_SCALE (2) * USE_DECONV_OUTPUT (2)
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
ROI_XFORM_SAMPLING_RATIO: 2
KEYPOINT_CONFIDENCE: bbox
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/20171220/X-101-32x8d.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/36760438/12_2017_baselines/rpn_person_only_X-101-32x8d-FPN_1x.yaml.06_04_23.M2oJlDPW/output/test/keypoints_coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/36760438/12_2017_baselines/rpn_person_only_X-101-32x8d-FPN_1x.yaml.06_04_23.M2oJlDPW/output/test/keypoints_coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (640, 672, 704, 736, 768, 800)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('keypoints_coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/36760438/12_2017_baselines/rpn_person_only_X-101-32x8d-FPN_1x.yaml.06_04_23.M2oJlDPW/output/test/keypoints_coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 2
KEYPOINTS_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 130000
STEPS: [0, 100000, 120000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
FAST_RCNN:
ROI_BOX_HEAD: head_builder.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
KRCNN:
ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX
NUM_STACKED_CONVS: 8
NUM_KEYPOINTS: 17
USE_DECONV_OUTPUT: True
CONV_INIT: MSRAFill
CONV_HEAD_DIM: 512
UP_SCALE: 2
HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) * UP_SCALE (2) * USE_DECONV_OUTPUT (2)
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
ROI_XFORM_SAMPLING_RATIO: 2
KEYPOINT_CONFIDENCE: bbox
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/20171220/X-101-32x8d.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/36760438/12_2017_baselines/rpn_person_only_X-101-32x8d-FPN_1x.yaml.06_04_23.M2oJlDPW/output/test/keypoints_coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/36760438/12_2017_baselines/rpn_person_only_X-101-32x8d-FPN_1x.yaml.06_04_23.M2oJlDPW/output/test/keypoints_coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (640, 672, 704, 736, 768, 800)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('keypoints_coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/36760438/12_2017_baselines/rpn_person_only_X-101-32x8d-FPN_1x.yaml.06_04_23.M2oJlDPW/output/test/keypoints_coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 2
KEYPOINTS_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 64
WIDTH_PER_GROUP: 4
FAST_RCNN:
ROI_BOX_HEAD: head_builder.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
KRCNN:
ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX
NUM_STACKED_CONVS: 8
NUM_KEYPOINTS: 17
USE_DECONV_OUTPUT: True
CONV_INIT: MSRAFill
CONV_HEAD_DIM: 512
UP_SCALE: 2
HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) * UP_SCALE (2) * USE_DECONV_OUTPUT (2)
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
ROI_XFORM_SAMPLING_RATIO: 2
KEYPOINT_CONFIDENCE: bbox
TRAIN:
# md5sum of weights pkl file: aa14062280226e48f569ef1c7212e7c7
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35999553/12_2017_baselines/rpn_person_only_X-101-64x4d-FPN_1x.yaml.08_21_33.ghFzzArr/output/test/keypoints_coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35999553/12_2017_baselines/rpn_person_only_X-101-64x4d-FPN_1x.yaml.08_21_33.ghFzzArr/output/test/keypoints_coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (640, 672, 704, 736, 768, 800)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('keypoints_coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35999553/12_2017_baselines/rpn_person_only_X-101-64x4d-FPN_1x.yaml.08_21_33.ghFzzArr/output/test/keypoints_coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 2
KEYPOINTS_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 130000
STEPS: [0, 100000, 120000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 64
WIDTH_PER_GROUP: 4
FAST_RCNN:
ROI_BOX_HEAD: head_builder.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
KRCNN:
ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX
NUM_STACKED_CONVS: 8
NUM_KEYPOINTS: 17
USE_DECONV_OUTPUT: True
CONV_INIT: MSRAFill
CONV_HEAD_DIM: 512
UP_SCALE: 2
HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) * UP_SCALE (2) * USE_DECONV_OUTPUT (2)
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
ROI_XFORM_SAMPLING_RATIO: 2
KEYPOINT_CONFIDENCE: bbox
TRAIN:
# md5sum of weights pkl file: aa14062280226e48f569ef1c7212e7c7
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35999553/12_2017_baselines/rpn_person_only_X-101-64x4d-FPN_1x.yaml.08_21_33.ghFzzArr/output/test/keypoints_coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35999553/12_2017_baselines/rpn_person_only_X-101-64x4d-FPN_1x.yaml.08_21_33.ghFzzArr/output/test/keypoints_coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (640, 672, 704, 736, 768, 800)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('keypoints_coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35999553/12_2017_baselines/rpn_person_only_X-101-64x4d-FPN_1x.yaml.08_21_33.ghFzzArr/output/test/keypoints_coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998887/12_2017_baselines/rpn_R-101-FPN_1x.yaml.08_07_07.vzhHEs0V/output/test/coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998887/12_2017_baselines/rpn_R-101-FPN_1x.yaml.08_07_07.vzhHEs0V/output/test/coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998887/12_2017_baselines/rpn_R-101-FPN_1x.yaml.08_07_07.vzhHEs0V/output/test/coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998887/12_2017_baselines/rpn_R-101-FPN_1x.yaml.08_07_07.vzhHEs0V/output/test/coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998887/12_2017_baselines/rpn_R-101-FPN_1x.yaml.08_07_07.vzhHEs0V/output/test/coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998887/12_2017_baselines/rpn_R-101-FPN_1x.yaml.08_07_07.vzhHEs0V/output/test/coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: ResNet.add_ResNet50_conv4_body
NUM_CLASSES: 81
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
# 1x schedule (note TRAIN.IMS_PER_BATCH: 1)
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
RPN:
SIZES: (32, 64, 128, 256, 512)
FAST_RCNN:
ROI_BOX_HEAD: ResNet.add_ResNet_roi_conv5_head
ROI_XFORM_METHOD: RoIAlign
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v0upshare
RESOLUTION: 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default: GaussianFill
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998355/12_2017_baselines/rpn_R-50-C4_1x.yaml.08_00_43.njH5oD9L/output/test/coco_2014_train/rpn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998355/12_2017_baselines/rpn_R-50-C4_1x.yaml.08_00_43.njH5oD9L/output/test/coco_2014_valminusminival/rpn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998355/12_2017_baselines/rpn_R-50-C4_1x.yaml.08_00_43.njH5oD9L/output/test/coco_2014_minival/rpn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: ResNet.add_ResNet50_conv4_body
NUM_CLASSES: 81
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
# 2x schedule (note TRAIN.IMS_PER_BATCH: 1)
MAX_ITER: 360000
STEPS: [0, 240000, 320000]
RPN:
SIZES: (32, 64, 128, 256, 512)
FAST_RCNN:
ROI_BOX_HEAD: ResNet.add_ResNet_roi_conv5_head
ROI_XFORM_METHOD: RoIAlign
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v0upshare
RESOLUTION: 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default: GaussianFill
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998355/12_2017_baselines/rpn_R-50-C4_1x.yaml.08_00_43.njH5oD9L/output/test/coco_2014_train/rpn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998355/12_2017_baselines/rpn_R-50-C4_1x.yaml.08_00_43.njH5oD9L/output/test/coco_2014_valminusminival/rpn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998355/12_2017_baselines/rpn_R-50-C4_1x.yaml.08_00_43.njH5oD9L/output/test/coco_2014_minival/rpn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 81
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998814/12_2017_baselines/rpn_R-50-FPN_1x.yaml.08_06_03.Axg0r179/output/test/coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998814/12_2017_baselines/rpn_R-50-FPN_1x.yaml.08_06_03.Axg0r179/output/test/coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998814/12_2017_baselines/rpn_R-50-FPN_1x.yaml.08_06_03.Axg0r179/output/test/coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 81
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998814/12_2017_baselines/rpn_R-50-FPN_1x.yaml.08_06_03.Axg0r179/output/test/coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998814/12_2017_baselines/rpn_R-50-FPN_1x.yaml.08_06_03.Axg0r179/output/test/coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998814/12_2017_baselines/rpn_R-50-FPN_1x.yaml.08_06_03.Axg0r179/output/test/coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
# 1x schedule (note TRAIN.IMS_PER_BATCH: 1)
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/20171220/X-101-32x8d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/36760102/12_2017_baselines/rpn_X-101-32x8d-FPN_1x.yaml.06_00_16.RWeBAniO/output/test/coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/36760102/12_2017_baselines/rpn_X-101-32x8d-FPN_1x.yaml.06_00_16.RWeBAniO/output/test/coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/36760102/12_2017_baselines/rpn_X-101-32x8d-FPN_1x.yaml.06_00_16.RWeBAniO/output/test/coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
# 2x schedule (note TRAIN.IMS_PER_BATCH: 1)
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 360000
STEPS: [0, 240000, 320000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/20171220/X-101-32x8d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/36760102/12_2017_baselines/rpn_X-101-32x8d-FPN_1x.yaml.06_00_16.RWeBAniO/output/test/coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/36760102/12_2017_baselines/rpn_X-101-32x8d-FPN_1x.yaml.06_00_16.RWeBAniO/output/test/coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/36760102/12_2017_baselines/rpn_X-101-32x8d-FPN_1x.yaml.06_00_16.RWeBAniO/output/test/coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
# 1x schedule (note TRAIN.IMS_PER_BATCH: 1)
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 64
WIDTH_PER_GROUP: 4
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
# md5sum of weights pkl file: aa14062280226e48f569ef1c7212e7c7
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998956/12_2017_baselines/rpn_X-101-64x4d-FPN_1x.yaml.08_08_41.Seh0psKz/output/test/coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998956/12_2017_baselines/rpn_X-101-64x4d-FPN_1x.yaml.08_08_41.Seh0psKz/output/test/coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998956/12_2017_baselines/rpn_X-101-64x4d-FPN_1x.yaml.08_08_41.Seh0psKz/output/test/coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
# 2x schedule (note TRAIN.IMS_PER_BATCH: 1)
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 360000
STEPS: [0, 240000, 320000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 64
WIDTH_PER_GROUP: 4
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
# md5sum of weights pkl file: aa14062280226e48f569ef1c7212e7c7
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998956/12_2017_baselines/rpn_X-101-64x4d-FPN_1x.yaml.08_08_41.Seh0psKz/output/test/coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998956/12_2017_baselines/rpn_X-101-64x4d-FPN_1x.yaml.08_08_41.Seh0psKz/output/test/coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998956/12_2017_baselines/rpn_X-101-64x4d-FPN_1x.yaml.08_08_41.Seh0psKz/output/test/coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
OUTPUT_DIR: .
MODEL:
TYPE: retinanet
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_RPN: True
RPN_MAX_LEVEL: 7
RPN_MIN_LEVEL: 3
COARSEST_STRIDE: 128
EXTRA_CONV_LEVELS: True
RETINANET:
RETINANET_ON: True
NUM_CONVS: 4
ASPECT_RATIOS: (1.0, 2.0, 0.5)
SCALES_PER_OCTAVE: 3
ANCHOR_SCALE: 4
LOSS_GAMMA: 2.0
LOSS_ALPHA: 0.25
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
RPN_STRADDLE_THRESH: -1 # default 0
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 10000 # Per FPN level
RPN_POST_NMS_TOP_N: 2000
OUTPUT_DIR: .
MODEL:
TYPE: retinanet
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_RPN: True
RPN_MAX_LEVEL: 7
RPN_MIN_LEVEL: 3
COARSEST_STRIDE: 128
EXTRA_CONV_LEVELS: True
RETINANET:
RETINANET_ON: True
NUM_CONVS: 4
ASPECT_RATIOS: (1.0, 2.0, 0.5)
SCALES_PER_OCTAVE: 3
ANCHOR_SCALE: 4
LOSS_GAMMA: 2.0
LOSS_ALPHA: 0.25
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
RPN_STRADDLE_THRESH: -1 # default 0
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 10000 # Per FPN level
RPN_POST_NMS_TOP_N: 2000
OUTPUT_DIR: .
MODEL:
TYPE: retinanet
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 81
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_RPN: True
RPN_MAX_LEVEL: 7
RPN_MIN_LEVEL: 3
COARSEST_STRIDE: 128
EXTRA_CONV_LEVELS: True
RETINANET:
RETINANET_ON: True
NUM_CONVS: 4
ASPECT_RATIOS: (1.0, 2.0, 0.5)
SCALES_PER_OCTAVE: 3
ANCHOR_SCALE: 4
LOSS_GAMMA: 2.0
LOSS_ALPHA: 0.25
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
RPN_STRADDLE_THRESH: -1 # default 0
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 10000 # Per FPN level
RPN_POST_NMS_TOP_N: 2000
OUTPUT_DIR: .
MODEL:
TYPE: retinanet
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 81
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_RPN: True
RPN_MAX_LEVEL: 7
RPN_MIN_LEVEL: 3
COARSEST_STRIDE: 128
EXTRA_CONV_LEVELS: True
RETINANET:
RETINANET_ON: True
NUM_CONVS: 4
ASPECT_RATIOS: (1.0, 2.0, 0.5)
SCALES_PER_OCTAVE: 3
ANCHOR_SCALE: 4
LOSS_GAMMA: 2.0
LOSS_ALPHA: 0.25
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
RPN_STRADDLE_THRESH: -1 # default 0
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 10000 # Per FPN level
RPN_POST_NMS_TOP_N: 2000
OUTPUT_DIR: .
MODEL:
TYPE: retinanet
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_RPN: True
RPN_MAX_LEVEL: 7
RPN_MIN_LEVEL: 3
COARSEST_STRIDE: 128
EXTRA_CONV_LEVELS: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
RETINANET:
RETINANET_ON: True
NUM_CONVS: 4
ASPECT_RATIOS: (1.0, 2.0, 0.5)
SCALES_PER_OCTAVE: 3
ANCHOR_SCALE: 4
LOSS_GAMMA: 2.0
LOSS_ALPHA: 0.25
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/20171220/X-101-32x8d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
RPN_STRADDLE_THRESH: -1 # default 0
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 10000 # Per FPN level
RPN_POST_NMS_TOP_N: 2000
OUTPUT_DIR: .
MODEL:
TYPE: retinanet
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_RPN: True
RPN_MAX_LEVEL: 7
RPN_MIN_LEVEL: 3
COARSEST_STRIDE: 128
EXTRA_CONV_LEVELS: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
RETINANET:
RETINANET_ON: True
NUM_CONVS: 4
ASPECT_RATIOS: (1.0, 2.0, 0.5)
SCALES_PER_OCTAVE: 3
ANCHOR_SCALE: 4
LOSS_GAMMA: 2.0
LOSS_ALPHA: 0.25
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/20171220/X-101-32x8d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
RPN_STRADDLE_THRESH: -1 # default 0
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 10000 # Per FPN level
RPN_POST_NMS_TOP_N: 2000
OUTPUT_DIR: .
MODEL:
TYPE: retinanet
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_RPN: True
RPN_MAX_LEVEL: 7
RPN_MIN_LEVEL: 3
COARSEST_STRIDE: 128
EXTRA_CONV_LEVELS: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 64
WIDTH_PER_GROUP: 4
RETINANET:
RETINANET_ON: True
NUM_CONVS: 4
ASPECT_RATIOS: (1.0, 2.0, 0.5)
SCALES_PER_OCTAVE: 3
ANCHOR_SCALE: 4
LOSS_GAMMA: 2.0
LOSS_ALPHA: 0.25
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
RPN_STRADDLE_THRESH: -1 # default 0
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 10000 # Per FPN level
RPN_POST_NMS_TOP_N: 2000
OUTPUT_DIR: .
MODEL:
TYPE: retinanet
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_RPN: True
RPN_MAX_LEVEL: 7
RPN_MIN_LEVEL: 3
COARSEST_STRIDE: 128
EXTRA_CONV_LEVELS: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 64
WIDTH_PER_GROUP: 4
RETINANET:
RETINANET_ON: True
NUM_CONVS: 4
ASPECT_RATIOS: (1.0, 2.0, 0.5)
SCALES_PER_OCTAVE: 3
ANCHOR_SCALE: 4
LOSS_GAMMA: 2.0
LOSS_ALPHA: 0.25
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
RPN_STRADDLE_THRESH: -1 # default 0
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 10000 # Per FPN level
RPN_POST_NMS_TOP_N: 2000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
RPN_ONLY: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_RPN: True
RPN_MAX_LEVEL: 6
RPN_MIN_LEVEL: 2
RPN_ANCHOR_START_SIZE: 32
RPN_ASPECT_RATIOS: (0.5, 1, 2)
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
TEST:
DATASETS: ('coco_2014_minival','coco_2014_train','coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 2000
OUTPUT_DIR: .
MODEL:
TYPE: rpn
CONV_BODY: ResNet.add_ResNet50_conv4_body
NUM_CLASSES: 81
RPN_ONLY: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
RPN:
SIZES: (32, 64, 128, 256, 512)
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
TEST:
DATASETS: ('coco_2014_minival','coco_2014_train','coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
USE_NCCL: False
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 81
RPN_ONLY: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_RPN: True
RPN_MAX_LEVEL: 6
RPN_MIN_LEVEL: 2
RPN_ANCHOR_START_SIZE: 32
RPN_ASPECT_RATIOS: (0.5, 1, 2)
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
TEST:
DATASETS: ('coco_2014_minival','coco_2014_train','coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 2000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
RPN_ONLY: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_RPN: True
RPN_MAX_LEVEL: 6
RPN_MIN_LEVEL: 2
RPN_ANCHOR_START_SIZE: 32
RPN_ASPECT_RATIOS: (0.5, 1, 2)
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/20171220/X-101-32x8d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
TEST:
DATASETS: ('coco_2014_minival','coco_2014_train','coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 2000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
RPN_ONLY: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_RPN: True
RPN_MAX_LEVEL: 6
RPN_MIN_LEVEL: 2
RPN_ANCHOR_START_SIZE: 32
RPN_ASPECT_RATIOS: (0.5, 1, 2)
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 64
WIDTH_PER_GROUP: 4
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
TEST:
DATASETS: ('coco_2014_minival','coco_2014_train','coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 2000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 2
RPN_ONLY: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_RPN: True
RPN_MAX_LEVEL: 6
RPN_MIN_LEVEL: 2
RPN_ANCHOR_START_SIZE: 32
RPN_ASPECT_RATIOS: (0.5, 1, 2)
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
TEST:
DATASETS: ('keypoints_coco_2014_minival', 'keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival', 'keypoints_coco_2015_test')
SCALES: (800,)
MAX_SIZE: 1333
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 2000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 2
RPN_ONLY: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_RPN: True
RPN_MAX_LEVEL: 6
RPN_MIN_LEVEL: 2
RPN_ANCHOR_START_SIZE: 32
RPN_ASPECT_RATIOS: (0.5, 1, 2)
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
TEST:
DATASETS: ('keypoints_coco_2014_minival', 'keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival', 'keypoints_coco_2015_test')
SCALES: (800,)
MAX_SIZE: 1333
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 2000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 2
RPN_ONLY: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_RPN: True
RPN_MAX_LEVEL: 6
RPN_MIN_LEVEL: 2
RPN_ANCHOR_START_SIZE: 32
RPN_ASPECT_RATIOS: (0.5, 1, 2)
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/20171220/X-101-32x8d.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
TEST:
DATASETS: ('keypoints_coco_2014_minival', 'keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival', 'keypoints_coco_2015_test')
SCALES: (800,)
MAX_SIZE: 1333
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 2000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 2
RPN_ONLY: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_RPN: True
RPN_MAX_LEVEL: 6
RPN_MIN_LEVEL: 2
RPN_ANCHOR_START_SIZE: 32
RPN_ASPECT_RATIOS: (0.5, 1, 2)
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 64
WIDTH_PER_GROUP: 4
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
TEST:
DATASETS: ('keypoints_coco_2014_minival', 'keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival', 'keypoints_coco_2015_test')
SCALES: (800,)
MAX_SIZE: 1333
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 2000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
NUM_GPUS: 1
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.0025
GAMMA: 0.1
MAX_ITER: 60000
STEPS: [0, 30000, 40000]
# Equivalent schedules with...
# 1 GPU:
# BASE_LR: 0.0025
# MAX_ITER: 60000
# STEPS: [0, 30000, 40000]
# 2 GPUs:
# BASE_LR: 0.005
# MAX_ITER: 30000
# STEPS: [0, 15000, 20000]
# 4 GPUs:
# BASE_LR: 0.01
# MAX_ITER: 15000
# STEPS: [0, 7500, 10000]
# 8 GPUs:
# BASE_LR: 0.02
# MAX_ITER: 7500
# STEPS: [0, 3750, 5000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train',)
SCALES: (500,)
MAX_SIZE: 833
BATCH_SIZE_PER_IM: 256
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (500,)
MAX_SIZE: 833
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
NUM_GPUS: 2
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.005
GAMMA: 0.1
MAX_ITER: 30000
STEPS: [0, 15000, 20000]
# Equivalent schedules with...
# 1 GPU:
# BASE_LR: 0.0025
# MAX_ITER: 60000
# STEPS: [0, 30000, 40000]
# 2 GPUs:
# BASE_LR: 0.005
# MAX_ITER: 30000
# STEPS: [0, 15000, 20000]
# 4 GPUs:
# BASE_LR: 0.01
# MAX_ITER: 15000
# STEPS: [0, 7500, 10000]
# 8 GPUs:
# BASE_LR: 0.02
# MAX_ITER: 7500
# STEPS: [0, 3750, 5000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train',)
SCALES: (500,)
MAX_SIZE: 833
BATCH_SIZE_PER_IM: 256
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (500,)
MAX_SIZE: 833
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
NUM_GPUS: 4
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 15000
STEPS: [0, 7500, 10000]
# Equivalent schedules with...
# 1 GPU:
# BASE_LR: 0.0025
# MAX_ITER: 60000
# STEPS: [0, 30000, 40000]
# 2 GPUs:
# BASE_LR: 0.005
# MAX_ITER: 30000
# STEPS: [0, 15000, 20000]
# 4 GPUs:
# BASE_LR: 0.01
# MAX_ITER: 15000
# STEPS: [0, 7500, 10000]
# 8 GPUs:
# BASE_LR: 0.02
# MAX_ITER: 7500
# STEPS: [0, 3750, 5000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train',)
SCALES: (500,)
MAX_SIZE: 833
BATCH_SIZE_PER_IM: 256
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (500,)
MAX_SIZE: 833
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 7500
STEPS: [0, 3750, 5000]
# Equivalent schedules with...
# 1 GPU:
# BASE_LR: 0.0025
# MAX_ITER: 60000
# STEPS: [0, 30000, 40000]
# 2 GPUs:
# BASE_LR: 0.005
# MAX_ITER: 30000
# STEPS: [0, 15000, 20000]
# 4 GPUs:
# BASE_LR: 0.01
# MAX_ITER: 15000
# STEPS: [0, 7500, 10000]
# 8 GPUs:
# BASE_LR: 0.02
# MAX_ITER: 7500
# STEPS: [0, 3750, 5000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train',)
SCALES: (500,)
MAX_SIZE: 833
BATCH_SIZE_PER_IM: 256
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (500,)
MAX_SIZE: 833
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .
MODEL:
TYPE: mask_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
MASK_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_minival',)
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/35857389/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_2x.yaml.01_37_22.KSeq0b5q/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl
# -- Test time augmentation example -- #
BBOX_AUG:
ENABLED: True
SCORE_HEUR: UNION # AVG NOTE: cannot use AVG for e2e model
COORD_HEUR: UNION # AVG NOTE: cannot use AVG for e2e model
H_FLIP: True
SCALES: (400, 500, 600, 700, 900, 1000, 1100, 1200)
MAX_SIZE: 2000
SCALE_H_FLIP: True
SCALE_SIZE_DEP: False
AREA_TH_LO: 2500 # 50^2
AREA_TH_HI: 32400 # 180^2
ASPECT_RATIOS: ()
ASPECT_RATIO_H_FLIP: False
MASK_AUG:
ENABLED: True
HEUR: SOFT_AVG
H_FLIP: True
SCALES: (400, 500, 600, 700, 900, 1000, 1100, 1200)
MAX_SIZE: 2000
SCALE_H_FLIP: True
SCALE_SIZE_DEP: False
AREA_TH: 32400 # 180^2
ASPECT_RATIOS: ()
ASPECT_RATIO_H_FLIP: False
BBOX_VOTE:
ENABLED: True
VOTE_TH: 0.9
# -- Test time augmentation example -- #
USE_NCCL: False
OUTPUT_DIR: .
MODEL:
TYPE: keypoint_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 2
KEYPOINTS_ON: True
NUM_GPUS: 8
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.02
GAMMA: 0.1
MAX_ITER: 90000
STEPS: [0, 60000, 80000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True # accidentally True; disable in the future
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
KRCNN:
ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX
NUM_STACKED_CONVS: 8
NUM_KEYPOINTS: 17
USE_DECONV_OUTPUT: True
CONV_INIT: MSRAFill
CONV_HEAD_DIM: 512
UP_SCALE: 2
HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) * UP_SCALE (2) * USE_DECONV_OUTPUT (2)
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
ROI_XFORM_SAMPLING_RATIO: 2
KEYPOINT_CONFIDENCE: bbox
TRAIN:
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival')
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998996/12_2017_baselines/rpn_person_only_R-50-FPN_1x.yaml.08_10_08.0ZWmJm6F/output/test/keypoints_coco_2014_train/generalized_rcnn/rpn_proposals.pkl', 'https://s3-us-west-2.amazonaws.com/detectron/35998996/12_2017_baselines/rpn_person_only_R-50-FPN_1x.yaml.08_10_08.0ZWmJm6F/output/test/keypoints_coco_2014_valminusminival/generalized_rcnn/rpn_proposals.pkl')
SCALES: (640, 672, 704, 736, 768, 800)
MAX_SIZE: 1333
BATCH_SIZE_PER_IM: 512
TEST:
DATASETS: ('keypoints_coco_2014_minival',)
PROPOSAL_FILES: ('https://s3-us-west-2.amazonaws.com/detectron/35998996/12_2017_baselines/rpn_person_only_R-50-FPN_1x.yaml.08_10_08.0ZWmJm6F/output/test/keypoints_coco_2014_minival/generalized_rcnn/rpn_proposals.pkl',)
PROPOSAL_LIMIT: 1000
SCALES: (800,)
MAX_SIZE: 1333
NMS: 0.5
WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/37651887/12_2017_baselines/keypoint_rcnn_R-50-FPN_s1x.yaml.20_01_40.FDjUQ7VX/output/train/keypoints_coco_2014_train%3Akeypoints_coco_2014_valminusminival/generalized_rcnn/model_final.pkl
# -- Test time augmentation example -- #
BBOX_AUG:
ENABLED: True
SCORE_HEUR: AVG
COORD_HEUR: AVG
H_FLIP: True
SCALES: (400, 500, 600, 700, 900, 1000, 1100, 1200)
MAX_SIZE: 2000
SCALE_H_FLIP: True
SCALE_SIZE_DEP: False
AREA_TH_LO: 2500 # 50^2
AREA_TH_HI: 32400 # 180^2
KPS_AUG:
ENABLED: True
HEUR: HM_AVG
H_FLIP: True
SCALES: (400, 500, 600, 700, 900, 1000, 1100, 1200)
MAX_SIZE: 2000
SCALE_H_FLIP: True
SCALE_SIZE_DEP: True
AREA_TH: 22500 # 150^2
ASPECT_RATIOS: ()
ASPECT_RATIO_H_FLIP: False
# -- Test time augmentation example -- #
OUTPUT_DIR: .
The demo images are licensed as United States government work:
https://www.usa.gov/government-works
The image files were obtained on Jan 13, 2018 from the following
URLs.
16004479832_a748d55f21_k.jpg
https://www.flickr.com/photos/archivesnews/16004479832
18124840932_e42b3e377c_k.jpg
https://www.flickr.com/photos/usnavy/18124840932
33887522274_eebd074106_k.jpg
https://www.flickr.com/photos/usaid_pakistan/33887522274
15673749081_767a7fa63a_k.jpg
https://www.flickr.com/photos/usnavy/15673749081
34501842524_3c858b3080_k.jpg
https://www.flickr.com/photos/departmentofenergy/34501842524
24274813513_0cfd2ce6d0_k.jpg
https://www.flickr.com/photos/dhsgov/24274813513
19064748793_bb942deea1_k.jpg
https://www.flickr.com/photos/statephotos/19064748793
33823288584_1d21cf0a26_k.jpg
https://www.flickr.com/photos/cbpphotos/33823288584
17790319373_bd19b24cfc_k.jpg
https://www.flickr.com/photos/secdef/17790319373
# Use Caffe2 image as parent image
FROM caffe2:cuda8-cudnn6-all-options
# Install Python dependencies
RUN pip install numpy pyyaml matplotlib opencv-python>=3.0 setuptools Cython mock
# Install the COCO API
RUN git clone https://github.com/cocodataset/cocoapi.git
WORKDIR /cocoapi/PythonAPI
RUN make install
# Clone the Detectron repository
RUN git clone https://github.com/facebookresearch/detectron /detectron
# Set up Python modules
WORKDIR /detectron/lib
RUN make
# Build custom ops
RUN make ops
# Go to Detectron root
WORKDIR /detectron
cmake_minimum_required(VERSION 2.8.12 FATAL_ERROR)
# Add CMake modules.
list(APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake/Modules)
# Add compiler flags.
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -std=c11")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -O2 -fPIC -Wno-narrowing")
# Include Caffe2 CMake utils.
include(cmake/Utils.cmake)
# Find dependencies.
include(cmake/Dependencies.cmake)
# Print configuration summary.
include(cmake/Summary.cmake)
detectron_print_config_summary()
# Collect custom ops sources.
file(GLOB CUSTOM_OPS_CPU_SRCS ${CMAKE_CURRENT_SOURCE_DIR}/ops/*.cc)
file(GLOB CUSTOM_OPS_GPU_SRCS ${CMAKE_CURRENT_SOURCE_DIR}/ops/*.cu)
# Install custom CPU ops lib.
add_library(
caffe2_detectron_custom_ops SHARED
${CUSTOM_OPS_CPU_SRCS})
target_link_libraries(caffe2_detectron_custom_ops caffe2)
install(TARGETS caffe2_detectron_custom_ops DESTINATION lib)
# Install custom GPU ops lib.
if (HAVE_CUDA)
CUDA_ADD_LIBRARY(
caffe2_detectron_custom_ops_gpu SHARED
${CUSTOM_OPS_CPU_SRCS}
${CUSTOM_OPS_GPU_SRCS})
target_link_libraries(caffe2_detectron_custom_ops_gpu caffe2_gpu)
install(TARGETS caffe2_detectron_custom_ops_gpu DESTINATION lib)
endif()
# Don't use the --user flag for setup.py develop mode with virtualenv.
DEV_USER_FLAG=$(shell python2 -c "import sys; print('' if hasattr(sys, 'real_prefix') else '--user')")
.PHONY: default
default: dev
.PHONY: install
install:
python2 setup.py install
.PHONY: ops
ops:
mkdir -p build && cd build && cmake .. && make -j$(shell nproc)
.PHONY: dev
dev:
python2 setup.py develop $(DEV_USER_FLAG)
.PHONY: clean
clean:
python2 setup.py develop --uninstall $(DEV_USER_FLAG)
rm -rf build
# Copied from https://github.com/caffe2/caffe2/blob/master/cmake/Cuda.cmake
# Caffe2 cmake utility to prepare for cuda build.
# This cmake file is called from Dependencies.cmake. You do not need to
# manually invoke it.
# Known NVIDIA GPU achitectures Caffe2 can be compiled for.
# Default is set to cuda 9. If we detect the cuda architectores to be less than
# 9, we will lower it to the corresponding known archs.
set(Caffe2_known_gpu_archs "30 35 50 52 60 61 70") # for CUDA 9.x
set(Caffe2_known_gpu_archs8 "20 21(20) 30 35 50 52 60 61") # for CUDA 8.x
set(Caffe2_known_gpu_archs7 "20 21(20) 30 35 50 52") # for CUDA 7.x
################################################################################################
# Function for selecting GPU arch flags for nvcc based on CUDA_ARCH_NAME
# Usage:
# caffe_select_nvcc_arch_flags(out_variable)
function(caffe2_select_nvcc_arch_flags out_variable)
# List of arch names
set(__archs_names "Kepler" "Maxwell" "Pascal" "Volta" "All" "Manual")
set(__archs_name_default "All")
# Set CUDA_ARCH_NAME strings (so it will be seen as dropbox in the CMake GUI)
set(CUDA_ARCH_NAME ${__archs_name_default} CACHE STRING "Select target NVIDIA GPU architecture")
set_property(CACHE CUDA_ARCH_NAME PROPERTY STRINGS "" ${__archs_names})
mark_as_advanced(CUDA_ARCH_NAME)
# Verify CUDA_ARCH_NAME value
if(NOT ";${__archs_names};" MATCHES ";${CUDA_ARCH_NAME};")
string(REPLACE ";" ", " __archs_names "${__archs_names}")
message(FATAL_ERROR "Invalid CUDA_ARCH_NAME, supported values: ${__archs_names}. Got ${CUDA_ARCH_NAME}")
endif()
if(${CUDA_ARCH_NAME} STREQUAL "Manual")
set(CUDA_ARCH_BIN "" CACHE STRING
"Specify GPU architectures to build binaries for (BIN(PTX) format is supported)")
set(CUDA_ARCH_PTX "" CACHE STRING
"Specify GPU architectures to build PTX intermediate code for")
mark_as_advanced(CUDA_ARCH_BIN CUDA_ARCH_PTX)
else()
unset(CUDA_ARCH_BIN CACHE)
unset(CUDA_ARCH_PTX CACHE)
endif()
if(${CUDA_ARCH_NAME} STREQUAL "Kepler")
set(__cuda_arch_bin "30 35")
elseif(${CUDA_ARCH_NAME} STREQUAL "Maxwell")
set(__cuda_arch_bin "50")
elseif(${CUDA_ARCH_NAME} STREQUAL "Pascal")
set(__cuda_arch_bin "60 61")
elseif(${CUDA_ARCH_NAME} STREQUAL "Volta")
set(__cuda_arch_bin "70")
elseif(${CUDA_ARCH_NAME} STREQUAL "All")
set(__cuda_arch_bin ${Caffe2_known_gpu_archs})
elseif(${CUDA_ARCH_NAME} STREQUAL "Manual")
set(__cuda_arch_bin ${CUDA_ARCH_BIN})
set(__cuda_arch_ptx ${CUDA_ARCH_PTX})
else()
message(FATAL_ERROR "Invalid CUDA_ARCH_NAME")
endif()
# Remove dots and convert to lists
string(REGEX REPLACE "\\." "" __cuda_arch_bin "${__cuda_arch_bin}")
string(REGEX REPLACE "\\." "" __cuda_arch_ptx "${__cuda_arch_ptx}")
string(REGEX MATCHALL "[0-9()]+" __cuda_arch_bin "${__cuda_arch_bin}")
string(REGEX MATCHALL "[0-9]+" __cuda_arch_ptx "${__cuda_arch_ptx}")
list(REMOVE_DUPLICATES __cuda_arch_bin)
list(REMOVE_DUPLICATES __cuda_arch_ptx)
set(__nvcc_flags "")
set(__nvcc_archs_readable "")
# Tell NVCC to add binaries for the specified GPUs
foreach(__arch ${__cuda_arch_bin})
if(__arch MATCHES "([0-9]+)\\(([0-9]+)\\)")
# User explicitly specified PTX for the concrete BIN
list(APPEND __nvcc_flags -gencode arch=compute_${CMAKE_MATCH_2},code=sm_${CMAKE_MATCH_1})
list(APPEND __nvcc_archs_readable sm_${CMAKE_MATCH_1})
else()
# User didn't explicitly specify PTX for the concrete BIN, we assume PTX=BIN
list(APPEND __nvcc_flags -gencode arch=compute_${__arch},code=sm_${__arch})
list(APPEND __nvcc_archs_readable sm_${__arch})
endif()
endforeach()
# Tell NVCC to add PTX intermediate code for the specified architectures
foreach(__arch ${__cuda_arch_ptx})
list(APPEND __nvcc_flags -gencode arch=compute_${__arch},code=compute_${__arch})
list(APPEND __nvcc_archs_readable compute_${__arch})
endforeach()
string(REPLACE ";" " " __nvcc_archs_readable "${__nvcc_archs_readable}")
set(${out_variable} ${__nvcc_flags} PARENT_SCOPE)
set(${out_variable}_readable ${__nvcc_archs_readable} PARENT_SCOPE)
endfunction()
################################################################################################
# Short command for cuda compilation
# Usage:
# caffe_cuda_compile(<objlist_variable> <cuda_files>)
macro(caffe2_cuda_compile objlist_variable)
foreach(var CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_RELEASE CMAKE_CXX_FLAGS_DEBUG)
set(${var}_backup_in_cuda_compile_ "${${var}}")
# we remove /EHa as it generates warnings under windows
string(REPLACE "/EHa" "" ${var} "${${var}}")
endforeach()
if(APPLE)
list(APPEND CUDA_NVCC_FLAGS -Xcompiler -Wno-unused-function)
endif()
cuda_compile(cuda_objcs ${ARGN})
foreach(var CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_RELEASE CMAKE_CXX_FLAGS_DEBUG)
set(${var} "${${var}_backup_in_cuda_compile_}")
unset(${var}_backup_in_cuda_compile_)
endforeach()
set(${objlist_variable} ${cuda_objcs})
endmacro()
################################################################################################
### Non macro section
################################################################################################
# Special care for windows platform: we know that 32-bit windows does not support cuda.
if(${CMAKE_SYSTEM_NAME} STREQUAL "Windows")
if(NOT (CMAKE_SIZEOF_VOID_P EQUAL 8))
message(FATAL_ERROR
"CUDA support not available with 32-bit windows. Did you "
"forget to set Win64 in the generator target?")
return()
endif()
endif()
find_package(CUDA 7.0 QUIET)
find_cuda_helper_libs(curand) # cmake 2.8.7 compartibility which doesn't search for curand
if(NOT CUDA_FOUND)
set(HAVE_CUDA FALSE)
return()
endif()
set(HAVE_CUDA TRUE)
message(STATUS "CUDA detected: " ${CUDA_VERSION})
if (${CUDA_VERSION} LESS 7.0)
message(FATAL_ERROR "Caffe2 requires CUDA 7.0 or later version")
elseif (${CUDA_VERSION} LESS 8.0) # CUDA 7.x
set(Caffe2_known_gpu_archs ${Caffe2_known_gpu_archs7})
list(APPEND CUDA_NVCC_FLAGS "-D_MWAITXINTRIN_H_INCLUDED")
list(APPEND CUDA_NVCC_FLAGS "-D__STRICT_ANSI__")
elseif (${CUDA_VERSION} LESS 9.0) # CUDA 8.x
set(Caffe2_known_gpu_archs ${Caffe2_known_gpu_archs8})
list(APPEND CUDA_NVCC_FLAGS "-D_MWAITXINTRIN_H_INCLUDED")
list(APPEND CUDA_NVCC_FLAGS "-D__STRICT_ANSI__")
# CUDA 8 may complain that sm_20 is no longer supported. Suppress the
# warning for now.
list(APPEND CUDA_NVCC_FLAGS "-Wno-deprecated-gpu-targets")
endif()
caffe2_include_directories(${CUDA_INCLUDE_DIRS})
list(APPEND Caffe2_CUDA_DEPENDENCY_LIBS ${CUDA_CUDART_LIBRARY}
${CUDA_curand_LIBRARY} ${CUDA_CUBLAS_LIBRARIES})
# find libcuda.so and lbnvrtc.so
# For libcuda.so, we will find it under lib, lib64, and then the
# stubs folder, in case we are building on a system that does not
# have cuda driver installed. On windows, we also search under the
# folder lib/x64.
find_library(CUDA_CUDA_LIB cuda
PATHS ${CUDA_TOOLKIT_ROOT_DIR}
PATH_SUFFIXES lib lib64 lib/stubs lib64/stubs lib/x64)
find_library(CUDA_NVRTC_LIB nvrtc
PATHS ${CUDA_TOOLKIT_ROOT_DIR}
PATH_SUFFIXES lib lib64 lib/x64)
# setting nvcc arch flags
caffe2_select_nvcc_arch_flags(NVCC_FLAGS_EXTRA)
list(APPEND CUDA_NVCC_FLAGS ${NVCC_FLAGS_EXTRA})
message(STATUS "Added CUDA NVCC flags for: ${NVCC_FLAGS_EXTRA_readable}")
if(CUDA_CUDA_LIB)
message(STATUS "Found libcuda: ${CUDA_CUDA_LIB}")
list(APPEND Caffe2_CUDA_DEPENDENCY_LIBS ${CUDA_CUDA_LIB})
else()
message(FATAL_ERROR "Cannot find libcuda.so. Please file an issue on https://github.com/caffe2/caffe2 with your build output.")
endif()
if(CUDA_NVRTC_LIB)
message(STATUS "Found libnvrtc: ${CUDA_NVRTC_LIB}")
list(APPEND Caffe2_CUDA_DEPENDENCY_LIBS ${CUDA_NVRTC_LIB})
else()
message(FATAL_ERROR "Cannot find libnvrtc.so. Please file an issue on https://github.com/caffe2/caffe2 with your build output.")
endif()
# disable some nvcc diagnostic that apears in boost, glog, glags, opencv, etc.
foreach(diag cc_clobber_ignored integer_sign_change useless_using_declaration set_but_not_used)
list(APPEND CUDA_NVCC_FLAGS -Xcudafe --diag_suppress=${diag})
endforeach()
# Set C++11 support
set(CUDA_PROPAGATE_HOST_FLAGS OFF)
if (NOT MSVC)
list(APPEND CUDA_NVCC_FLAGS "-std=c++11")
list(APPEND CUDA_NVCC_FLAGS "-Xcompiler -fPIC")
endif()
# Debug and Release symbol support
if (MSVC)
if (${CMAKE_BUILD_TYPE} MATCHES "Release")
if (${BUILD_SHARED_LIBS})
list(APPEND CUDA_NVCC_FLAGS "-Xcompiler -MD")
else()
list(APPEND CUDA_NVCC_FLAGS "-Xcompiler -MT")
endif()
elseif(${CMAKE_BUILD_TYPE} MATCHES "Debug")
message(FATAL_ERROR
"Caffe2 currently does not support the combination of MSVC, Cuda "
"and Debug mode. Either set USE_CUDA=OFF or set the build type "
"to Release")
if (${BUILD_SHARED_LIBS})
list(APPEND CUDA_NVCC_FLAGS "-Xcompiler -MDd")
else()
list(APPEND CUDA_NVCC_FLAGS "-Xcompiler -MTd")
endif()
else()
message(FATAL_ERROR "Unknown cmake build type: " ${CMAKE_BUILD_TYPE})
endif()
endif()
if(OpenMP_FOUND)
list(APPEND CUDA_NVCC_FLAGS "-Xcompiler ${OpenMP_CXX_FLAGS}")
endif()
# Set :expt-relaxed-constexpr to suppress Eigen warnings
list(APPEND CUDA_NVCC_FLAGS "--expt-relaxed-constexpr")
mark_as_advanced(CUDA_BUILD_CUBIN CUDA_BUILD_EMULATION CUDA_VERBOSE_BUILD)
mark_as_advanced(CUDA_SDK_ROOT_DIR CUDA_SEPARABLE_COMPILATION)
# Adapted from https://github.com/caffe2/caffe2/blob/master/cmake/Dependencies.cmake
# Find the Caffe2 package.
# Caffe2 exports the required targets, so find_package should work for
# the standard Caffe2 installation. If you encounter problems with finding
# the Caffe2 package, make sure you have run `make install` when installing
# Caffe2 (`make install` populates your share/cmake/Caffe2).
find_package(Caffe2 REQUIRED)
# Find CUDA.
include(cmake/Cuda.cmake)
if (HAVE_CUDA)
# CUDA 9.x requires GCC version <= 6
if ((CUDA_VERSION VERSION_EQUAL 9.0) OR
(CUDA_VERSION VERSION_GREATER 9.0 AND CUDA_VERSION VERSION_LESS 10.0))
if (CMAKE_C_COMPILER_ID STREQUAL "GNU" AND
NOT CMAKE_C_COMPILER_VERSION VERSION_LESS 7.0 AND
CUDA_HOST_COMPILER STREQUAL CMAKE_C_COMPILER)
message(FATAL_ERROR
"CUDA ${CUDA_VERSION} is not compatible with GCC version >= 7. "
"Use the following option to use another version (for example): \n"
" -DCUDA_HOST_COMPILER=/usr/bin/gcc-6\n")
endif()
# CUDA 8.0 requires GCC version <= 5
elseif (CUDA_VERSION VERSION_EQUAL 8.0)
if (CMAKE_C_COMPILER_ID STREQUAL "GNU" AND
NOT CMAKE_C_COMPILER_VERSION VERSION_LESS 6.0 AND
CUDA_HOST_COMPILER STREQUAL CMAKE_C_COMPILER)
message(FATAL_ERROR
"CUDA 8.0 is not compatible with GCC version >= 6. "
"Use the following option to use another version (for example): \n"
" -DCUDA_HOST_COMPILER=/usr/bin/gcc-5\n")
endif()
endif()
endif()
# Find CUDNN.
if (HAVE_CUDA)
find_package(CuDNN REQUIRED)
if (CUDNN_FOUND)
caffe2_include_directories(${CUDNN_INCLUDE_DIRS})
endif()
endif()
# Copied from https://github.com/caffe2/caffe2/blob/master/cmake/Modules/FindCuDNN.cmake
# - Try to find cuDNN
#
# The following variables are optionally searched for defaults
# CUDNN_ROOT_DIR: Base directory where all cuDNN components are found
#
# The following are set after configuration is done:
# CUDNN_FOUND
# CUDNN_INCLUDE_DIRS
# CUDNN_LIBRARIES
# CUDNN_LIBRARY_DIRS
include(FindPackageHandleStandardArgs)
set(CUDNN_ROOT_DIR "" CACHE PATH "Folder contains NVIDIA cuDNN")
find_path(CUDNN_INCLUDE_DIR cudnn.h
HINTS ${CUDNN_ROOT_DIR} ${CUDA_TOOLKIT_ROOT_DIR}
PATH_SUFFIXES cuda/include include)
find_library(CUDNN_LIBRARY cudnn
HINTS ${CUDNN_ROOT_DIR} ${CUDA_TOOLKIT_ROOT_DIR}
PATH_SUFFIXES lib lib64 cuda/lib cuda/lib64 lib/x64)
find_package_handle_standard_args(
CUDNN DEFAULT_MSG CUDNN_INCLUDE_DIR CUDNN_LIBRARY)
if(CUDNN_FOUND)
# get cuDNN version
file(READ ${CUDNN_INCLUDE_DIR}/cudnn.h CUDNN_HEADER_CONTENTS)
string(REGEX MATCH "define CUDNN_MAJOR * +([0-9]+)"
CUDNN_VERSION_MAJOR "${CUDNN_HEADER_CONTENTS}")
string(REGEX REPLACE "define CUDNN_MAJOR * +([0-9]+)" "\\1"
CUDNN_VERSION_MAJOR "${CUDNN_VERSION_MAJOR}")
string(REGEX MATCH "define CUDNN_MINOR * +([0-9]+)"
CUDNN_VERSION_MINOR "${CUDNN_HEADER_CONTENTS}")
string(REGEX REPLACE "define CUDNN_MINOR * +([0-9]+)" "\\1"
CUDNN_VERSION_MINOR "${CUDNN_VERSION_MINOR}")
string(REGEX MATCH "define CUDNN_PATCHLEVEL * +([0-9]+)"
CUDNN_VERSION_PATCH "${CUDNN_HEADER_CONTENTS}")
string(REGEX REPLACE "define CUDNN_PATCHLEVEL * +([0-9]+)" "\\1"
CUDNN_VERSION_PATCH "${CUDNN_VERSION_PATCH}")
# Assemble cuDNN version
if(NOT CUDNN_VERSION_MAJOR)
set(CUDNN_VERSION "?")
else()
set(CUDNN_VERSION "${CUDNN_VERSION_MAJOR}.${CUDNN_VERSION_MINOR}.${CUDNN_VERSION_PATCH}")
endif()
set(CUDNN_INCLUDE_DIRS ${CUDNN_INCLUDE_DIR})
set(CUDNN_LIBRARIES ${CUDNN_LIBRARY})
message(STATUS "Found cuDNN: v${CUDNN_VERSION} (include: ${CUDNN_INCLUDE_DIR}, library: ${CUDNN_LIBRARY})")
mark_as_advanced(CUDNN_ROOT_DIR CUDNN_LIBRARY CUDNN_INCLUDE_DIR)
endif()
# Adapted from https://github.com/caffe2/caffe2/blob/master/cmake/Summary.cmake
# Prints configuration summary.
function (detectron_print_config_summary)
message(STATUS "Summary:")
message(STATUS " CMake version : ${CMAKE_VERSION}")
message(STATUS " CMake command : ${CMAKE_COMMAND}")
message(STATUS " System name : ${CMAKE_SYSTEM_NAME}")
message(STATUS " C++ compiler : ${CMAKE_CXX_COMPILER}")
message(STATUS " C++ compiler version : ${CMAKE_CXX_COMPILER_VERSION}")
message(STATUS " CXX flags : ${CMAKE_CXX_FLAGS}")
message(STATUS " Caffe2 version : ${CAFFE2_VERSION}")
message(STATUS " Caffe2 include path : ${CAFFE2_INCLUDE_DIRS}")
message(STATUS " Have CUDA : ${HAVE_CUDA}")
if (${HAVE_CUDA})
message(STATUS " CUDA version : ${CUDA_VERSION}")
message(STATUS " CuDNN version : ${CUDNN_VERSION}")
endif()
endfunction()
# Copied from https://github.com/caffe2/caffe2/blob/master/cmake/Utils.cmake
################################################################################################
# Exclude and prepend functionalities
function (exclude OUTPUT INPUT)
set(EXCLUDES ${ARGN})
foreach(EXCLUDE ${EXCLUDES})
list(REMOVE_ITEM INPUT "${EXCLUDE}")
endforeach()
set(${OUTPUT} ${INPUT} PARENT_SCOPE)
endfunction(exclude)
function (prepend OUTPUT PREPEND)
set(OUT "")
foreach(ITEM ${ARGN})
list(APPEND OUT "${PREPEND}${ITEM}")
endforeach()
set(${OUTPUT} ${OUT} PARENT_SCOPE)
endfunction(prepend)
################################################################################################
# Clears variables from list
# Usage:
# caffe_clear_vars(<variables_list>)
macro(caffe_clear_vars)
foreach(_var ${ARGN})
unset(${_var})
endforeach()
endmacro()
################################################################################################
# Prints list element per line
# Usage:
# caffe_print_list(<list>)
function(caffe_print_list)
foreach(e ${ARGN})
message(STATUS ${e})
endforeach()
endfunction()
################################################################################################
# Reads set of version defines from the header file
# Usage:
# caffe_parse_header(<file> <define1> <define2> <define3> ..)
macro(caffe_parse_header FILENAME FILE_VAR)
set(vars_regex "")
set(__parnet_scope OFF)
set(__add_cache OFF)
foreach(name ${ARGN})
if("${name}" STREQUAL "PARENT_SCOPE")
set(__parnet_scope ON)
elseif("${name}" STREQUAL "CACHE")
set(__add_cache ON)
elseif(vars_regex)
set(vars_regex "${vars_regex}|${name}")
else()
set(vars_regex "${name}")
endif()
endforeach()
if(EXISTS "${FILENAME}")
file(STRINGS "${FILENAME}" ${FILE_VAR} REGEX "#define[ \t]+(${vars_regex})[ \t]+[0-9]+" )
else()
unset(${FILE_VAR})
endif()
foreach(name ${ARGN})
if(NOT "${name}" STREQUAL "PARENT_SCOPE" AND NOT "${name}" STREQUAL "CACHE")
if(${FILE_VAR})
if(${FILE_VAR} MATCHES ".+[ \t]${name}[ \t]+([0-9]+).*")
string(REGEX REPLACE ".+[ \t]${name}[ \t]+([0-9]+).*" "\\1" ${name} "${${FILE_VAR}}")
else()
set(${name} "")
endif()
if(__add_cache)
set(${name} ${${name}} CACHE INTERNAL "${name} parsed from ${FILENAME}" FORCE)
elseif(__parnet_scope)
set(${name} "${${name}}" PARENT_SCOPE)
endif()
else()
unset(${name} CACHE)
endif()
endif()
endforeach()
endmacro()
################################################################################################
# Reads single version define from the header file and parses it
# Usage:
# caffe_parse_header_single_define(<library_name> <file> <define_name>)
function(caffe_parse_header_single_define LIBNAME HDR_PATH VARNAME)
set(${LIBNAME}_H "")
if(EXISTS "${HDR_PATH}")
file(STRINGS "${HDR_PATH}" ${LIBNAME}_H REGEX "^#define[ \t]+${VARNAME}[ \t]+\"[^\"]*\".*$" LIMIT_COUNT 1)
endif()
if(${LIBNAME}_H)
string(REGEX REPLACE "^.*[ \t]${VARNAME}[ \t]+\"([0-9]+).*$" "\\1" ${LIBNAME}_VERSION_MAJOR "${${LIBNAME}_H}")
string(REGEX REPLACE "^.*[ \t]${VARNAME}[ \t]+\"[0-9]+\\.([0-9]+).*$" "\\1" ${LIBNAME}_VERSION_MINOR "${${LIBNAME}_H}")
string(REGEX REPLACE "^.*[ \t]${VARNAME}[ \t]+\"[0-9]+\\.[0-9]+\\.([0-9]+).*$" "\\1" ${LIBNAME}_VERSION_PATCH "${${LIBNAME}_H}")
set(${LIBNAME}_VERSION_MAJOR ${${LIBNAME}_VERSION_MAJOR} ${ARGN} PARENT_SCOPE)
set(${LIBNAME}_VERSION_MINOR ${${LIBNAME}_VERSION_MINOR} ${ARGN} PARENT_SCOPE)
set(${LIBNAME}_VERSION_PATCH ${${LIBNAME}_VERSION_PATCH} ${ARGN} PARENT_SCOPE)
set(${LIBNAME}_VERSION_STRING "${${LIBNAME}_VERSION_MAJOR}.${${LIBNAME}_VERSION_MINOR}.${${LIBNAME}_VERSION_PATCH}" PARENT_SCOPE)
# append a TWEAK version if it exists:
set(${LIBNAME}_VERSION_TWEAK "")
if("${${LIBNAME}_H}" MATCHES "^.*[ \t]${VARNAME}[ \t]+\"[0-9]+\\.[0-9]+\\.[0-9]+\\.([0-9]+).*$")
set(${LIBNAME}_VERSION_TWEAK "${CMAKE_MATCH_1}" ${ARGN} PARENT_SCOPE)
endif()
if(${LIBNAME}_VERSION_TWEAK)
set(${LIBNAME}_VERSION_STRING "${${LIBNAME}_VERSION_STRING}.${${LIBNAME}_VERSION_TWEAK}" ${ARGN} PARENT_SCOPE)
else()
set(${LIBNAME}_VERSION_STRING "${${LIBNAME}_VERSION_STRING}" ${ARGN} PARENT_SCOPE)
endif()
endif()
endfunction()
########################################################################################################
# An option that the user can select. Can accept condition to control when option is available for user.
# Usage:
# caffe_option(<option_variable> "doc string" <initial value or boolean expression> [IF <condition>])
function(caffe_option variable description value)
set(__value ${value})
set(__condition "")
set(__varname "__value")
foreach(arg ${ARGN})
if(arg STREQUAL "IF" OR arg STREQUAL "if")
set(__varname "__condition")
else()
list(APPEND ${__varname} ${arg})
endif()
endforeach()
unset(__varname)
if("${__condition}" STREQUAL "")
set(__condition 2 GREATER 1)
endif()
if(${__condition})
if("${__value}" MATCHES ";")
if(${__value})
option(${variable} "${description}" ON)
else()
option(${variable} "${description}" OFF)
endif()
elseif(DEFINED ${__value})
if(${__value})
option(${variable} "${description}" ON)
else()
option(${variable} "${description}" OFF)
endif()
else()
option(${variable} "${description}" ${__value})
endif()
else()
unset(${variable} CACHE)
endif()
endfunction()
##############################################################################
# Helper function to add as-needed flag around a library.
function(caffe_add_as_needed_flag lib output_var)
if("${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang")
# TODO: Clang seems to not need this flag. Double check.
set(${output_var} ${lib} PARENT_SCOPE)
elseif(MSVC)
# TODO: check what is the behavior of MSVC.
# In MSVC, we will add whole archive in default.
set(${output_var} ${lib} PARENT_SCOPE)
else()
# Assume everything else is like gcc: we will need as-needed flag.
set(${output_var} -Wl,--no-as-needed ${lib} -Wl,--as-needed PARENT_SCOPE)
endif()
endfunction()
##############################################################################
# Helper function to add whole_archive flag around a library.
function(caffe_add_whole_archive_flag lib output_var)
if("${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang")
set(${output_var} -Wl,-force_load,$<TARGET_FILE:${lib}> PARENT_SCOPE)
elseif(MSVC)
# In MSVC, we will add whole archive in default.
set(${output_var} -WHOLEARCHIVE:$<TARGET_FILE:${lib}> PARENT_SCOPE)
else()
# Assume everything else is like gcc
set(${output_var} -Wl,--whole-archive ${lib} -Wl,--no-whole-archive PARENT_SCOPE)
endif()
endfunction()
##############################################################################
# Helper function to add either as-needed, or whole_archive flag around a library.
function(caffe_add_linker_flag lib output_var)
if (BUILD_SHARED_LIBS)
caffe_add_as_needed_flag(${lib} tmp)
else()
caffe_add_whole_archive_flag(${lib} tmp)
endif()
set(${output_var} ${tmp} PARENT_SCOPE)
endfunction()
##############################################################################
# Helper function to automatically generate __init__.py files where python
# sources reside but there are no __init__.py present.
function(caffe_autogen_init_py_files)
file(GLOB_RECURSE all_python_files RELATIVE ${PROJECT_SOURCE_DIR}
"${PROJECT_SOURCE_DIR}/caffe2/*.py")
set(python_paths_need_init_py)
foreach(python_file ${all_python_files})
get_filename_component(python_path ${python_file} PATH)
string(REPLACE "/" ";" path_parts ${python_path})
set(rebuilt_path ${CMAKE_BINARY_DIR})
foreach(path_part ${path_parts})
set(rebuilt_path "${rebuilt_path}/${path_part}")
list(APPEND python_paths_need_init_py ${rebuilt_path})
endforeach()
endforeach()
list(REMOVE_DUPLICATES python_paths_need_init_py)
# Since the _pb2.py files are yet to be created, we will need to manually
# add them to the list.
list(APPEND python_paths_need_init_py ${CMAKE_BINARY_DIR}/caffe)
list(APPEND python_paths_need_init_py ${CMAKE_BINARY_DIR}/caffe/proto)
list(APPEND python_paths_need_init_py ${CMAKE_BINARY_DIR}/caffe2/proto)
foreach(tmp ${python_paths_need_init_py})
if(NOT EXISTS ${tmp}/__init__.py)
# message(STATUS "Generate " ${tmp}/__init__.py)
file(WRITE ${tmp}/__init__.py "")
endif()
endforeach()
endfunction()
##############################################################################
# Creating a Caffe2 binary target with sources specified with relative path.
# Usage:
# caffe2_binary_target(target_name_or_src <src1> [<src2>] [<src3>] ...)
# If only target_name_or_src is specified, this target is build with one single
# source file and the target name is autogen from the filename. Otherwise, the
# target name is given by the first argument and the rest are the source files
# to build the target.
function(caffe2_binary_target target_name_or_src)
if (${ARGN})
set(__target ${target_name_or_src})
prepend(__srcs "${CMAKE_CURRENT_SOURCE_DIR}/" "${ARGN}")
else()
get_filename_component(__target ${target_name_or_src} NAME_WE)
prepend(__srcs "${CMAKE_CURRENT_SOURCE_DIR}/" "${target_name_or_src}")
endif()
add_executable(${__target} ${__srcs})
add_dependencies(${__target} ${Caffe2_MAIN_LIBS_ORDER})
target_link_libraries(${__target} ${Caffe2_MAIN_LIBS} ${Caffe2_DEPENDENCY_LIBS})
install(TARGETS ${__target} DESTINATION bin)
endfunction()
##############################################################################
# Helper function to add paths to system include directories.
#
# Anaconda distributions typically contain a lot of packages and some
# of those can conflict with headers/libraries that must be sourced
# from elsewhere. This helper ensures that Anaconda paths are always
# added AFTER other include paths, such that it does not accidentally
# takes precedence when it shouldn't.
#
# This is just a heuristic and does not have any guarantees. We can
# add other corner cases here (as long as they are generic enough).
# A complete include path cross checker is a final resort if this
# hacky approach proves insufficient.
#
function(caffe2_include_directories)
foreach(path IN LISTS ARGN)
if (${path} MATCHES "/anaconda")
include_directories(AFTER SYSTEM ${path})
else()
include_directories(BEFORE SYSTEM ${path})
endif()
endforeach()
endfunction()
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
#
# Based on:
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
"""Detectron config system.
This file specifies default config options for Detectron. You should not
change values in this file. Instead, you should write a config file (in yaml)
and use merge_cfg_from_file(yaml_file) to load it and override the default
options.
Most tools in the tools directory take a --cfg option to specify an override
file and an optional list of override (key, value) pairs:
- See tools/{train,test}_net.py for example code that uses merge_cfg_from_file
- See configs/*/*.yaml for example config files
Detectron supports a lot of different model types, each of which has a lot of
different options. The result is a HUGE set of configuration options.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from ast import literal_eval
from past.builtins import basestring
from utils.collections import AttrDict
import copy
import logging
import numpy as np
import os
import os.path as osp
import yaml
from utils.io import cache_url
logger = logging.getLogger(__name__)
__C = AttrDict()
# Consumers can get config by:
# from core.config import cfg
cfg = __C
# Random note: avoid using '.ON' as a config key since yaml converts it to True;
# prefer 'ENABLED' instead
# ---------------------------------------------------------------------------- #
# Training options
# ---------------------------------------------------------------------------- #
__C.TRAIN = AttrDict()
# Initialize network with weights from this .pkl file
__C.TRAIN.WEIGHTS = b''
# Datasets to train on
# Available dataset list: datasets.dataset_catalog.DATASETS.keys()
# If multiple datasets are listed, the model is trained on their union
__C.TRAIN.DATASETS = ()
# Scales to use during training
# Each scale is the pixel size of an image's shortest side
# If multiple scales are listed, then one is selected uniformly at random for
# each training image (i.e., scale jitter data augmentation)
__C.TRAIN.SCALES = (600, )
# Max pixel size of the longest side of a scaled input image
__C.TRAIN.MAX_SIZE = 1000
# Images *per GPU* in the training minibatch
# Total images per minibatch = TRAIN.IMS_PER_BATCH * NUM_GPUS
__C.TRAIN.IMS_PER_BATCH = 2
# RoI minibatch size *per image* (number of regions of interest [ROIs])
# Total number of RoIs per training minibatch =
# TRAIN.BATCH_SIZE_PER_IM * TRAIN.IMS_PER_BATCH * NUM_GPUS
# E.g., a common configuration is: 512 * 2 * 8 = 8192
__C.TRAIN.BATCH_SIZE_PER_IM = 64
# Target fraction of RoI minibatch that is labeled foreground (i.e. class > 0)
__C.TRAIN.FG_FRACTION = 0.25
# Overlap threshold for an RoI to be considered foreground (if >= FG_THRESH)
__C.TRAIN.FG_THRESH = 0.5
# Overlap threshold for an RoI to be considered background (class = 0 if
# overlap in [LO, HI))
__C.TRAIN.BG_THRESH_HI = 0.5
__C.TRAIN.BG_THRESH_LO = 0.0
# Use horizontally-flipped images during training?
__C.TRAIN.USE_FLIPPED = True
# Overlap required between an RoI and a ground-truth box in order for that
# (RoI, gt box) pair to be used as a bounding-box regression training example
__C.TRAIN.BBOX_THRESH = 0.5
# Snapshot (model checkpoint) period
# Divide by NUM_GPUS to determine actual period (e.g., 20000/8 => 2500 iters)
# to allow for linear training schedule scaling
__C.TRAIN.SNAPSHOT_ITERS = 20000
# Train using these proposals
# During training, all proposals specified in the file are used (no limit is
# applied)
# Proposal files must be in correspondence with the datasets listed in
# TRAIN.DATASETS
__C.TRAIN.PROPOSAL_FILES = ()
# Make minibatches from images that have similar aspect ratios (i.e. both
# tall and thin or both short and wide)
# This feature is critical for saving memory (and makes training slightly
# faster)
__C.TRAIN.ASPECT_GROUPING = True
# ---------------------------------------------------------------------------- #
# RPN training options
# ---------------------------------------------------------------------------- #
# Minimum overlap required between an anchor and ground-truth box for the
# (anchor, gt box) pair to be a positive example (IOU >= thresh ==> positive RPN
# example)
__C.TRAIN.RPN_POSITIVE_OVERLAP = 0.7
# Maximum overlap allowed between an anchor and ground-truth box for the
# (anchor, gt box) pair to be a negative examples (IOU < thresh ==> negative RPN
# example)
__C.TRAIN.RPN_NEGATIVE_OVERLAP = 0.3
# Target fraction of foreground (positive) examples per RPN minibatch
__C.TRAIN.RPN_FG_FRACTION = 0.5
# Total number of RPN examples per image
__C.TRAIN.RPN_BATCH_SIZE_PER_IM = 256
# NMS threshold used on RPN proposals (used during end-to-end training with RPN)
__C.TRAIN.RPN_NMS_THRESH = 0.7
# Number of top scoring RPN proposals to keep before applying NMS
# When FPN is used, this is *per FPN level* (not total)
__C.TRAIN.RPN_PRE_NMS_TOP_N = 12000
# Number of top scoring RPN proposals to keep after applying NMS
# This is the total number of RPN proposals produced (for both FPN and non-FPN
# cases)
__C.TRAIN.RPN_POST_NMS_TOP_N = 2000
# Remove RPN anchors that go outside the image by RPN_STRADDLE_THRESH pixels
# Set to -1 or a large value, e.g. 100000, to disable pruning anchors
__C.TRAIN.RPN_STRADDLE_THRESH = 0
# Proposal height and width both need to be greater than RPN_MIN_SIZE
# (at orig image scale; not scale used during training or inference)
__C.TRAIN.RPN_MIN_SIZE = 0
# Filter proposals that are inside of crowd regions by CROWD_FILTER_THRESH
# "Inside" is measured as: proposal-with-crowd intersection area divided by
# proposal area
__C.TRAIN.CROWD_FILTER_THRESH = 0.7
# Ignore ground-truth objects with area < this threshold
__C.TRAIN.GT_MIN_AREA = -1
# Freeze the backbone architecture during training if set to True
__C.TRAIN.FREEZE_CONV_BODY = False
# Training will resume from the latest snapshot (model checkpoint) found in the
# output directory
__C.TRAIN.AUTO_RESUME = True
# ---------------------------------------------------------------------------- #
# Data loader options
# ---------------------------------------------------------------------------- #
__C.DATA_LOADER = AttrDict()
# Number of Python threads to use for the data loader (warning: using too many
# threads can cause GIL-based interference with Python Ops leading to *slower*
# training; 4 seems to be the sweet spot in our experience)
__C.DATA_LOADER.NUM_THREADS = 4
# ---------------------------------------------------------------------------- #
# Inference ('test') options
# ---------------------------------------------------------------------------- #
__C.TEST = AttrDict()
# Initialize network with weights from this .pkl file
__C.TEST.WEIGHTS = b''
# Datasets to test on
# Available dataset list: datasets.dataset_catalog.DATASETS.keys()
# If multiple datasets are listed, testing is performed on each one sequentially
__C.TEST.DATASETS = ()
# Scales to use during testing
# Each scale is the pixel size of an image's shortest side
# If multiple scales are given, then all scales are used as in multiscale
# inference
__C.TEST.SCALES = (600, )
# Max pixel size of the longest side of a scaled input image
__C.TEST.MAX_SIZE = 1000
# Overlap threshold used for non-maximum suppression (suppress boxes with
# IoU >= this threshold)
__C.TEST.NMS = 0.3
# Apply Fast R-CNN style bounding-box regression if True
__C.TEST.BBOX_REG = True
# Test using these proposal files (must correspond with TEST.DATASETS)
__C.TEST.PROPOSAL_FILES = ()
# Limit on the number of proposals per image used during inference
__C.TEST.PROPOSAL_LIMIT = 2000
# NMS threshold used on RPN proposals
__C.TEST.RPN_NMS_THRESH = 0.7
# Number of top scoring RPN proposals to keep before applying NMS
# When FPN is used, this is *per FPN level* (not total)
__C.TEST.RPN_PRE_NMS_TOP_N = 12000
# Number of top scoring RPN proposals to keep after applying NMS
# This is the total number of RPN proposals produced (for both FPN and non-FPN
# cases)
__C.TEST.RPN_POST_NMS_TOP_N = 2000
# Proposal height and width both need to be greater than RPN_MIN_SIZE
# (at orig image scale; not scale used during training or inference)
__C.TEST.RPN_MIN_SIZE = 0
# Maximum number of detections to return per image (100 is based on the limit
# established for the COCO dataset)
__C.TEST.DETECTIONS_PER_IM = 100
# Minimum score threshold (assuming scores in a [0, 1] range); a value chosen to
# balance obtaining high recall with not having too many low precision
# detections that will slow down inference post processing steps (like NMS)
__C.TEST.SCORE_THRESH = 0.05
# Save detection results files if True
# If false, results files are cleaned up (they can be large) after local
# evaluation
__C.TEST.COMPETITION_MODE = True
# Evaluate detections with the COCO json dataset eval code even if it's not the
# evaluation code for the dataset (e.g. evaluate PASCAL VOC results using the
# COCO API to get COCO style AP on PASCAL VOC)
__C.TEST.FORCE_JSON_DATASET_EVAL = False
# Number of images to test on - presently used in RetinaNet Inference only
# If the dataset name include 'test-dev' or 'test', this is ignored (i.e.,
# it's intended to apply to a validation set)
__C.TEST.NUM_TEST_IMAGES = 5000
# [Inferred value; do not set directly in a config]
# Indicates if precomputed proposals are used at test time
# Not set for 1-stage models and 2-stage models with RPN subnetwork enabled
__C.TEST.PRECOMPUTED_PROPOSALS = True
# [Inferred value; do not set directly in a config]
# Active dataset to test on
__C.TEST.DATASET = b''
# [Inferred value; do not set directly in a config]
# Active proposal file to use
__C.TEST.PROPOSAL_FILE = b''
# ---------------------------------------------------------------------------- #
# Test-time augmentations for bounding box detection
# See configs/test_time_aug/e2e_mask_rcnn_R-50-FPN_2x.yaml for an example
# ---------------------------------------------------------------------------- #
__C.TEST.BBOX_AUG = AttrDict()
# Enable test-time augmentation for bounding box detection if True
__C.TEST.BBOX_AUG.ENABLED = False
# Heuristic used to combine predicted box scores
# Valid options: ('ID', 'AVG', 'UNION')
__C.TEST.BBOX_AUG.SCORE_HEUR = b'UNION'
# Heuristic used to combine predicted box coordinates
# Valid options: ('ID', 'AVG', 'UNION')
__C.TEST.BBOX_AUG.COORD_HEUR = b'UNION'
# Horizontal flip at the original scale (id transform)
__C.TEST.BBOX_AUG.H_FLIP = False
# Each scale is the pixel size of an image's shortest side
__C.TEST.BBOX_AUG.SCALES = ()
# Max pixel size of the longer side
__C.TEST.BBOX_AUG.MAX_SIZE = 4000
# Horizontal flip at each scale
__C.TEST.BBOX_AUG.SCALE_H_FLIP = False
# Apply scaling based on object size
__C.TEST.BBOX_AUG.SCALE_SIZE_DEP = False
__C.TEST.BBOX_AUG.AREA_TH_LO = 50**2
__C.TEST.BBOX_AUG.AREA_TH_HI = 180**2
# Each aspect ratio is relative to image width
__C.TEST.BBOX_AUG.ASPECT_RATIOS = ()
# Horizontal flip at each aspect ratio
__C.TEST.BBOX_AUG.ASPECT_RATIO_H_FLIP = False
# ---------------------------------------------------------------------------- #
# Test-time augmentations for mask detection
# See configs/test_time_aug/e2e_mask_rcnn_R-50-FPN_2x.yaml for an example
# ---------------------------------------------------------------------------- #
__C.TEST.MASK_AUG = AttrDict()
# Enable test-time augmentation for instance mask detection if True
__C.TEST.MASK_AUG.ENABLED = False
# Heuristic used to combine mask predictions
# SOFT prefix indicates that the computation is performed on soft masks
# Valid options: ('SOFT_AVG', 'SOFT_MAX', 'LOGIT_AVG')
__C.TEST.MASK_AUG.HEUR = b'SOFT_AVG'
# Horizontal flip at the original scale (id transform)
__C.TEST.MASK_AUG.H_FLIP = False
# Each scale is the pixel size of an image's shortest side
__C.TEST.MASK_AUG.SCALES = ()
# Max pixel size of the longer side
__C.TEST.MASK_AUG.MAX_SIZE = 4000
# Horizontal flip at each scale
__C.TEST.MASK_AUG.SCALE_H_FLIP = False
# Apply scaling based on object size
__C.TEST.MASK_AUG.SCALE_SIZE_DEP = False
__C.TEST.MASK_AUG.AREA_TH = 180**2
# Each aspect ratio is relative to image width
__C.TEST.MASK_AUG.ASPECT_RATIOS = ()
# Horizontal flip at each aspect ratio
__C.TEST.MASK_AUG.ASPECT_RATIO_H_FLIP = False
# ---------------------------------------------------------------------------- #
# Test-augmentations for keypoints detection
# configs/test_time_aug/keypoint_rcnn_R-50-FPN_1x.yaml
# ---------------------------------------------------------------------------- #
__C.TEST.KPS_AUG = AttrDict()
# Enable test-time augmentation for keypoint detection if True
__C.TEST.KPS_AUG.ENABLED = False
# Heuristic used to combine keypoint predictions
# Valid options: ('HM_AVG', 'HM_MAX')
__C.TEST.KPS_AUG.HEUR = b'HM_AVG'
# Horizontal flip at the original scale (id transform)
__C.TEST.KPS_AUG.H_FLIP = False
# Each scale is the pixel size of an image's shortest side
__C.TEST.KPS_AUG.SCALES = ()
# Max pixel size of the longer side
__C.TEST.KPS_AUG.MAX_SIZE = 4000
# Horizontal flip at each scale
__C.TEST.KPS_AUG.SCALE_H_FLIP = False
# Apply scaling based on object size
__C.TEST.KPS_AUG.SCALE_SIZE_DEP = False
__C.TEST.KPS_AUG.AREA_TH = 180**2
# Eeach aspect ratio is realtive to image width
__C.TEST.KPS_AUG.ASPECT_RATIOS = ()
# Horizontal flip at each aspect ratio
__C.TEST.KPS_AUG.ASPECT_RATIO_H_FLIP = False
# ---------------------------------------------------------------------------- #
# Soft NMS
# ---------------------------------------------------------------------------- #
__C.TEST.SOFT_NMS = AttrDict()
# Use soft NMS instead of standard NMS if set to True
__C.TEST.SOFT_NMS.ENABLED = False
# See soft NMS paper for definition of these options
__C.TEST.SOFT_NMS.METHOD = b'linear'
__C.TEST.SOFT_NMS.SIGMA = 0.5
# For the soft NMS overlap threshold, we simply use TEST.NMS
# ---------------------------------------------------------------------------- #
# Bounding box voting (from the Multi-Region CNN paper)
# ---------------------------------------------------------------------------- #
__C.TEST.BBOX_VOTE = AttrDict()
# Use box voting if set to True
__C.TEST.BBOX_VOTE.ENABLED = False
# We use TEST.NMS threshold for the NMS step. VOTE_TH overlap threshold
# is used to select voting boxes (IoU >= VOTE_TH) for each box that survives NMS
__C.TEST.BBOX_VOTE.VOTE_TH = 0.8
# The method used to combine scores when doing bounding box voting
# Valid options include ('ID', 'AVG', 'IOU_AVG', 'GENERALIZED_AVG', 'QUASI_SUM')
__C.TEST.BBOX_VOTE.SCORING_METHOD = b'ID'
# Hyperparameter used by the scoring method (it has different meanings for
# different methods)
__C.TEST.BBOX_VOTE.SCORING_METHOD_BETA = 1.0
# ---------------------------------------------------------------------------- #
# Model options
# ---------------------------------------------------------------------------- #
__C.MODEL = AttrDict()
# The type of model to use
# The string must match a function in the modeling.model_builder module
# (e.g., 'generalized_rcnn', 'mask_rcnn', ...)
__C.MODEL.TYPE = b''
# The backbone conv body to use
# The string must match a function that is imported in modeling.model_builder
# (e.g., 'FPN.add_fpn_ResNet101_conv5_body' to specify a ResNet-101-FPN
# backbone)
__C.MODEL.CONV_BODY = b''
# Number of classes in the dataset; must be set
# E.g., 81 for COCO (80 foreground + 1 background)
__C.MODEL.NUM_CLASSES = -1
# Use a class agnostic bounding box regressor instead of the default per-class
# regressor
__C.MODEL.CLS_AGNOSTIC_BBOX_REG = False
# Default weights on (dx, dy, dw, dh) for normalizing bbox regression targets
# These are empirically chosen to approximately lead to unit variance targets
__C.MODEL.BBOX_REG_WEIGHTS = (10., 10., 5., 5.)
# The meaning of FASTER_RCNN depends on the context (training vs. inference):
# 1) During training, FASTER_RCNN = True means that end-to-end training will be
# used to jointly train the RPN subnetwork and the Fast R-CNN subnetwork
# (Faster R-CNN = RPN + Fast R-CNN).
# 2) During inference, FASTER_RCNN = True means that the model's RPN subnetwork
# will be used to generate proposals rather than relying on precomputed
# proposals. Note that FASTER_RCNN = True can be used at inference time even
# if the Faster R-CNN model was trained with stagewise training (which
# consists of alternating between RPN and Fast R-CNN training in a way that
# finally leads to a single network).
__C.MODEL.FASTER_RCNN = False
# Indicates the model makes instance mask predictions (as in Mask R-CNN)
__C.MODEL.MASK_ON = False
# Indicates the model makes keypoint predictions (as in Mask R-CNN for
# keypoints)
__C.MODEL.KEYPOINTS_ON = False
# Indicates the model's computation terminates with the production of RPN
# proposals (i.e., it outputs proposals ONLY, no actual object detections)
__C.MODEL.RPN_ONLY = False
# Caffe2 net execution type
# Use 'prof_dag' to get profiling statistics
__C.MODEL.EXECUTION_TYPE = b'dag'
# ---------------------------------------------------------------------------- #
# RetinaNet options
# ---------------------------------------------------------------------------- #
__C.RETINANET = AttrDict()
# RetinaNet is used (instead of Fast/er/Mask R-CNN/R-FCN/RPN) if True
__C.RETINANET.RETINANET_ON = False
# Anchor aspect ratios to use
__C.RETINANET.ASPECT_RATIOS = (0.5, 1.0, 2.0)
# Anchor scales per octave
__C.RETINANET.SCALES_PER_OCTAVE = 3
# At each FPN level, we generate anchors based on their scale, aspect_ratio,
# stride of the level, and we multiply the resulting anchor by ANCHOR_SCALE
__C.RETINANET.ANCHOR_SCALE = 4
# Convolutions to use in the cls and bbox tower
# NOTE: this doesn't include the last conv for logits
__C.RETINANET.NUM_CONVS = 4
# Weight for bbox_regression loss
__C.RETINANET.BBOX_REG_WEIGHT = 1.0
# Smooth L1 loss beta for bbox regression
__C.RETINANET.BBOX_REG_BETA = 0.11
# During inference, #locs to select based on cls score before NMS is performed
# per FPN level
__C.RETINANET.PRE_NMS_TOP_N = 1000
# IoU overlap ratio for labeling an anchor as positive
# Anchors with >= iou overlap are labeled positive
__C.RETINANET.POSITIVE_OVERLAP = 0.5
# IoU overlap ratio for labeling an anchor as negative
# Anchors with < iou overlap are labeled negative
__C.RETINANET.NEGATIVE_OVERLAP = 0.4
# Focal loss parameter: alpha
__C.RETINANET.LOSS_ALPHA = 0.25
# Focal loss parameter: gamma
__C.RETINANET.LOSS_GAMMA = 2.0
# Prior prob for the positives at the beginning of training. This is used to set
# the bias init for the logits layer
__C.RETINANET.PRIOR_PROB = 0.01
# Whether classification and bbox branch tower should be shared or not
__C.RETINANET.SHARE_CLS_BBOX_TOWER = False
# Use class specific bounding box regression instead of the default class
# agnostic regression
__C.RETINANET.CLASS_SPECIFIC_BBOX = False
# Whether softmax should be used in classification branch training
__C.RETINANET.SOFTMAX = False
# Inference cls score threshold, anchors with score > INFERENCE_TH are
# considered for inference
__C.RETINANET.INFERENCE_TH = 0.05
# ---------------------------------------------------------------------------- #
# Solver options
# Note: all solver options are used exactly as specified; the implication is
# that if you switch from training on 1 GPU to N GPUs, you MUST adjust the
# solver configuration accordingly. We suggest using gradual warmup and the
# linear learning rate scaling rule as described in
# "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour" Goyal et al.
# https://arxiv.org/abs/1706.02677
# ---------------------------------------------------------------------------- #
__C.SOLVER = AttrDict()
# Base learning rate for the specified schedule
__C.SOLVER.BASE_LR = 0.001
# Schedule type (see functions in utils.lr_policy for options)
# E.g., 'step', 'steps_with_decay', ...
__C.SOLVER.LR_POLICY = b'step'
# Some LR Policies (by example):
# 'step'
# lr = SOLVER.BASE_LR * SOLVER.GAMMA ** (cur_iter // SOLVER.STEP_SIZE)
# 'steps_with_decay'
# SOLVER.STEPS = [0, 60000, 80000]
# SOLVER.GAMMA = 0.1
# lr = SOLVER.BASE_LR * SOLVER.GAMMA ** current_step
# iters [0, 59999] are in current_step = 0, iters [60000, 79999] are in
# current_step = 1, and so on
# 'steps_with_lrs'
# SOLVER.STEPS = [0, 60000, 80000]
# SOLVER.LRS = [0.02, 0.002, 0.0002]
# lr = LRS[current_step]
# Hyperparameter used by the specified policy
# For 'step', the current LR is multiplied by SOLVER.GAMMA at each step
__C.SOLVER.GAMMA = 0.1
# Uniform step size for 'steps' policy
__C.SOLVER.STEP_SIZE = 30000
# Non-uniform step iterations for 'steps_with_decay' or 'steps_with_lrs'
# policies
__C.SOLVER.STEPS = []
# Learning rates to use with 'steps_with_lrs' policy
__C.SOLVER.LRS = []
# Maximum number of SGD iterations
__C.SOLVER.MAX_ITER = 40000
# Momentum to use with SGD
__C.SOLVER.MOMENTUM = 0.9
# L2 regularization hyperparameter
__C.SOLVER.WEIGHT_DECAY = 0.0005
# Warm up to SOLVER.BASE_LR over this number of SGD iterations
__C.SOLVER.WARM_UP_ITERS = 500
# Start the warm up from SOLVER.BASE_LR * SOLVER.WARM_UP_FACTOR
__C.SOLVER.WARM_UP_FACTOR = 1.0 / 3.0
# WARM_UP_METHOD can be either 'constant' or 'linear' (i.e., gradual)
__C.SOLVER.WARM_UP_METHOD = 'linear'
# Scale the momentum update history by new_lr / old_lr when updating the
# learning rate (this is correct given MomentumSGDUpdateOp)
__C.SOLVER.SCALE_MOMENTUM = True
# Only apply the correction if the relative LR change exceeds this threshold
# (prevents ever change in linear warm up from scaling the momentum by a tiny
# amount; momentum scaling is only important if the LR change is large)
__C.SOLVER.SCALE_MOMENTUM_THRESHOLD = 1.1
# Suppress logging of changes to LR unless the relative change exceeds this
# threshold (prevents linear warm up from spamming the training log)
__C.SOLVER.LOG_LR_CHANGE_THRESHOLD = 1.1
# ---------------------------------------------------------------------------- #
# Fast R-CNN options
# ---------------------------------------------------------------------------- #
__C.FAST_RCNN = AttrDict()
# The type of RoI head to use for bounding box classification and regression
# The string must match a function this is imported in modeling.model_builder
# (e.g., 'head_builder.add_roi_2mlp_head' to specify a two hidden layer MLP)
__C.FAST_RCNN.ROI_BOX_HEAD = b''
# Hidden layer dimension when using an MLP for the RoI box head
__C.FAST_RCNN.MLP_HEAD_DIM = 1024
# RoI transformation function (e.g., RoIPool or RoIAlign)
# (RoIPoolF is the same as RoIPool; ignore the trailing 'F')
__C.FAST_RCNN.ROI_XFORM_METHOD = b'RoIPoolF'
# Number of grid sampling points in RoIAlign (usually use 2)
# Only applies to RoIAlign
__C.FAST_RCNN.ROI_XFORM_SAMPLING_RATIO = 0
# RoI transform output resolution
# Note: some models may have constraints on what they can use, e.g. they use
# pretrained FC layers like in VGG16, and will ignore this option
__C.FAST_RCNN.ROI_XFORM_RESOLUTION = 14
# ---------------------------------------------------------------------------- #
# RPN options
# ---------------------------------------------------------------------------- #
__C.RPN = AttrDict()
# [Infered value; do not set directly in a config]
# Indicates that the model contains an RPN subnetwork
__C.RPN.RPN_ON = False
# RPN anchor sizes given in absolute pixels w.r.t. the scaled network input
# Note: these options are *not* used by FPN RPN; see FPN.RPN* options
__C.RPN.SIZES = (64, 128, 256, 512)
# Stride of the feature map that RPN is attached
__C.RPN.STRIDE = 16
# RPN anchor aspect ratios
__C.RPN.ASPECT_RATIOS = (0.5, 1, 2)
# ---------------------------------------------------------------------------- #
# FPN options
# ---------------------------------------------------------------------------- #
__C.FPN = AttrDict()
# FPN is enabled if True
__C.FPN.FPN_ON = False
# Channel dimension of the FPN feature levels
__C.FPN.DIM = 256
# Initialize the lateral connections to output zero if True
__C.FPN.ZERO_INIT_LATERAL = False
# Stride of the coarsest FPN level
# This is needed so the input can be padded properly
__C.FPN.COARSEST_STRIDE = 32
#
# FPN may be used for just RPN, just object detection, or both
#
# Use FPN for RoI transform for object detection if True
__C.FPN.MULTILEVEL_ROIS = False
# Hyperparameters for the RoI-to-FPN level mapping heuristic
__C.FPN.ROI_CANONICAL_SCALE = 224 # s0
__C.FPN.ROI_CANONICAL_LEVEL = 4 # k0: where s0 maps to
# Coarsest level of the FPN pyramid
__C.FPN.ROI_MAX_LEVEL = 5
# Finest level of the FPN pyramid
__C.FPN.ROI_MIN_LEVEL = 2
# Use FPN for RPN if True
__C.FPN.MULTILEVEL_RPN = False
# Coarsest level of the FPN pyramid
__C.FPN.RPN_MAX_LEVEL = 6
# Finest level of the FPN pyramid
__C.FPN.RPN_MIN_LEVEL = 2
# FPN RPN anchor aspect ratios
__C.FPN.RPN_ASPECT_RATIOS = (0.5, 1, 2)
# RPN anchors start at this size on RPN_MIN_LEVEL
# The anchor size doubled each level after that
# With a default of 32 and levels 2 to 6, we get anchor sizes of 32 to 512
__C.FPN.RPN_ANCHOR_START_SIZE = 32
# Use extra FPN levels, as done in the RetinaNet paper
__C.FPN.EXTRA_CONV_LEVELS = False
# ---------------------------------------------------------------------------- #
# Mask R-CNN options ("MRCNN" means Mask R-CNN)
# ---------------------------------------------------------------------------- #
__C.MRCNN = AttrDict()
# The type of RoI head to use for instance mask prediction
# The string must match a function this is imported in modeling.model_builder
# (e.g., 'mask_rcnn_heads.ResNet_mask_rcnn_fcn_head_v1up4convs')
__C.MRCNN.ROI_MASK_HEAD = b''
# Resolution of mask predictions
__C.MRCNN.RESOLUTION = 14
# RoI transformation function and associated options
__C.MRCNN.ROI_XFORM_METHOD = b'RoIAlign'
# RoI transformation function (e.g., RoIPool or RoIAlign)
__C.MRCNN.ROI_XFORM_RESOLUTION = 7
# Number of grid sampling points in RoIAlign (usually use 2)
# Only applies to RoIAlign
__C.MRCNN.ROI_XFORM_SAMPLING_RATIO = 0
# Number of channels in the mask head
__C.MRCNN.DIM_REDUCED = 256
# Use dilated convolution in the mask head
__C.MRCNN.DILATION = 2
# Upsample the predicted masks by this factor
__C.MRCNN.UPSAMPLE_RATIO = 1
# Use a fully-connected layer to predict the final masks instead of a conv layer
__C.MRCNN.USE_FC_OUTPUT = False
# Weight initialization method for the mask head and mask output layers
__C.MRCNN.CONV_INIT = b'GaussianFill'
# Use class specific mask predictions if True (otherwise use class agnostic mask
# predictions)
__C.MRCNN.CLS_SPECIFIC_MASK = True
# Multi-task loss weight for masks
__C.MRCNN.WEIGHT_LOSS_MASK = 1.0
# Binarization threshold for converting soft masks to hard masks
__C.MRCNN.THRESH_BINARIZE = 0.5
# ---------------------------------------------------------------------------- #
# Keyoint Mask R-CNN options ("KRCNN" = Mask R-CNN with Keypoint support)
# ---------------------------------------------------------------------------- #
__C.KRCNN = AttrDict()
# The type of RoI head to use for instance keypoint prediction
# The string must match a function this is imported in modeling.model_builder
# (e.g., 'keypoint_rcnn_heads.add_roi_pose_head_v1convX')
__C.KRCNN.ROI_KEYPOINTS_HEAD = b''
# Output size (and size loss is computed on), e.g., 56x56
__C.KRCNN.HEATMAP_SIZE = -1
# Use bilinear interpolation to upsample the final heatmap by this factor
__C.KRCNN.UP_SCALE = -1
# Apply a ConvTranspose layer to the hidden representation computed by the
# keypoint head prior to predicting the per-keypoint heatmaps
__C.KRCNN.USE_DECONV = False
# Channel dimension of the hidden representation produced by the ConvTranspose
__C.KRCNN.DECONV_DIM = 256
# Use a ConvTranspose layer to predict the per-keypoint heatmaps
__C.KRCNN.USE_DECONV_OUTPUT = False
# Use dilation in the keypoint head
__C.KRCNN.DILATION = 1
# Size of the kernels to use in all ConvTranspose operations
__C.KRCNN.DECONV_KERNEL = 4
# Number of keypoints in the dataset (e.g., 17 for COCO)
__C.KRCNN.NUM_KEYPOINTS = -1
# Number of stacked Conv layers in keypoint head
__C.KRCNN.NUM_STACKED_CONVS = 8
# Dimension of the hidden representation output by the keypoint head
__C.KRCNN.CONV_HEAD_DIM = 256
# Conv kernel size used in the keypoint head
__C.KRCNN.CONV_HEAD_KERNEL = 3
# Conv kernel weight filling function
__C.KRCNN.CONV_INIT = b'GaussianFill'
# Use NMS based on OKS if True
__C.KRCNN.NMS_OKS = False
# Source of keypoint confidence
# Valid options: ('bbox', 'logit', 'prob')
__C.KRCNN.KEYPOINT_CONFIDENCE = b'bbox'
# Standard ROI XFORM options (see FAST_RCNN or MRCNN options)
__C.KRCNN.ROI_XFORM_METHOD = b'RoIAlign'
__C.KRCNN.ROI_XFORM_RESOLUTION = 7
__C.KRCNN.ROI_XFORM_SAMPLING_RATIO = 0
# Minimum number of labeled keypoints that must exist in a minibatch (otherwise
# the minibatch is discarded)
__C.KRCNN.MIN_KEYPOINT_COUNT_FOR_VALID_MINIBATCH = 20
# When infering the keypoint locations from the heatmap, don't scale the heatmap
# below this minimum size
__C.KRCNN.INFERENCE_MIN_SIZE = 0
# Multi-task loss weight to use for keypoints
# Recommended values:
# - use 1.0 if KRCNN.NORMALIZE_BY_VISIBLE_KEYPOINTS is True
# - use 4.0 if KRCNN.NORMALIZE_BY_VISIBLE_KEYPOINTS is False
__C.KRCNN.LOSS_WEIGHT = 1.0
# Normalize by the total number of visible keypoints in the minibatch if True.
# Otherwise, normalize by the total number of keypoints that could ever exist
# in the minibatch. See comments in modeling.model_builder.add_keypoint_losses
# for detailed discussion.
__C.KRCNN.NORMALIZE_BY_VISIBLE_KEYPOINTS = True
# ---------------------------------------------------------------------------- #
# R-FCN options
# ---------------------------------------------------------------------------- #
__C.RFCN = AttrDict()
# Position-sensitive RoI pooling output grid size (height and width)
__C.RFCN.PS_GRID_SIZE = 3
# ---------------------------------------------------------------------------- #
# ResNets options ("ResNets" = ResNet and ResNeXt)
# ---------------------------------------------------------------------------- #
__C.RESNETS = AttrDict()
# Number of groups to use; 1 ==> ResNet; > 1 ==> ResNeXt
__C.RESNETS.NUM_GROUPS = 1
# Baseline width of each group
__C.RESNETS.WIDTH_PER_GROUP = 64
# Place the stride 2 conv on the 1x1 filter
# Use True only for the original MSRA ResNet; use False for C2 and Torch models
__C.RESNETS.STRIDE_1X1 = True
# Residual transformation function
__C.RESNETS.TRANS_FUNC = b'bottleneck_transformation'
# Apply dilation in stage "res5"
__C.RESNETS.RES5_DILATION = 1
# ---------------------------------------------------------------------------- #
# Misc options
# ---------------------------------------------------------------------------- #
# Number of GPUs to use (applies to both training and testing)
__C.NUM_GPUS = 1
# Use NCCL for all reduce, otherwise use muji
# Warning: if set to True, you may experience deadlocks
__C.USE_NCCL = False
# The mapping from image coordinates to feature map coordinates might cause
# some boxes that are distinct in image space to become identical in feature
# coordinates. If DEDUP_BOXES > 0, then DEDUP_BOXES is used as the scale factor
# for identifying duplicate boxes.
# 1/16 is correct for {Alex,Caffe}Net, VGG_CNN_M_1024, and VGG16
__C.DEDUP_BOXES = 1 / 16.
# Clip bounding box transformation predictions to prevent np.exp from
# overflowing
# Heuristic choice based on that would scale a 16 pixel anchor up to 1000 pixels
__C.BBOX_XFORM_CLIP = np.log(1000. / 16.)
# Pixel mean values (BGR order) as a (1, 1, 3) array
# We use the same pixel mean for all networks even though it's not exactly what
# they were trained with
# "Fun" fact: the history of where these values comes from is lost
__C.PIXEL_MEANS = np.array([[[102.9801, 115.9465, 122.7717]]])
# For reproducibility...but not really because modern fast GPU libraries use
# non-deterministic op implementations
__C.RNG_SEED = 3
# A small number that's used many times
__C.EPS = 1e-14
# Root directory of project
__C.ROOT_DIR = os.getcwd()
# Output basedir
__C.OUTPUT_DIR = b'/tmp'
# Name (or path to) the matlab executable
__C.MATLAB = b'matlab'
# Reduce memory usage with memonger gradient blob sharing
__C.MEMONGER = True
# Futher reduce memory by allowing forward pass activations to be shared when
# possible. Note that this will cause activation blob inspection (values,
# shapes, etc.) to be meaningless when activation blobs are reused.
__C.MEMONGER_SHARE_ACTIVATIONS = False
# Dump detection visualizations
__C.VIS = False
# Score threshold for visualization
__C.VIS_TH = 0.9
# Expected results should take the form of a list of expectations, each
# specified by four elements (dataset, task, metric, expected value). For
# example: [['coco_2014_minival', 'box_proposal', 'AR@1000', 0.387]]
__C.EXPECTED_RESULTS = []
# Absolute and relative tolerance to use when comparing to EXPECTED_RESULTS
__C.EXPECTED_RESULTS_RTOL = 0.1
__C.EXPECTED_RESULTS_ATOL = 0.005
# Set to send email in case of an EXPECTED_RESULTS failure
__C.EXPECTED_RESULTS_EMAIL = b''
# Models and proposals referred to by URL are downloaded to a local cache
# specified by DOWNLOAD_CACHE
__C.DOWNLOAD_CACHE = b'/tmp/detectron-download-cache'
# ---------------------------------------------------------------------------- #
# Cluster options
# ---------------------------------------------------------------------------- #
__C.CLUSTER = AttrDict()
# Flag to indicate if the code is running in a cluster environment
__C.CLUSTER.ON_CLUSTER = False
# ---------------------------------------------------------------------------- #
# Deprecated options
# If an option is removed from the code and you don't want to break existing
# yaml configs, you can add the full config key as a string to the set below.
# ---------------------------------------------------------------------------- #
_DEPCRECATED_KEYS = set(
(
'FINAL_MSG',
'MODEL.DILATION',
'ROOT_GPU_ID',
'RPN.ON',
'TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED',
'TRAIN.DROPOUT',
'USE_GPU_NMS',
)
)
# ---------------------------------------------------------------------------- #
# Renamed options
# If you rename a config option, record the mapping from the old name to the new
# name in the dictionary below. Optionally, if the type also changed, you can
# make the value a tuple that specifies first the renamed key and then
# instructions for how to edit the config file.
# ---------------------------------------------------------------------------- #
_RENAMED_KEYS = {
'EXAMPLE.RENAMED.KEY': 'EXAMPLE.KEY', # Dummy example to follow
'MODEL.PS_GRID_SIZE': 'RFCN.PS_GRID_SIZE',
'MODEL.ROI_HEAD': 'FAST_RCNN.ROI_BOX_HEAD',
'MRCNN.MASK_HEAD_NAME': 'MRCNN.ROI_MASK_HEAD',
'TRAIN.DATASET': (
'TRAIN.DATASETS',
"Also convert to a tuple, e.g., " +
"'coco_2014_train' -> ('coco_2014_train',) or " +
"'coco_2014_train:coco_2014_valminusminival' -> " +
"('coco_2014_train', 'coco_2014_valminusminival')"
),
'TRAIN.PROPOSAL_FILE': (
'TRAIN.PROPOSAL_FILES',
"Also convert to a tuple, e.g., " +
"'path/to/file' -> ('path/to/file',) or " +
"'path/to/file1:path/to/file2' -> " +
"('path/to/file1', 'path/to/file2')"
),
}
def assert_and_infer_cfg(cache_urls=True):
if __C.MODEL.RPN_ONLY or __C.MODEL.FASTER_RCNN:
__C.RPN.RPN_ON = True
if __C.RPN.RPN_ON or __C.RETINANET.RETINANET_ON:
__C.TEST.PRECOMPUTED_PROPOSALS = False
if cache_urls:
cache_cfg_urls()
def cache_cfg_urls():
"""Download URLs in the config, cache them locally, and rewrite cfg to make
use of the locally cached file.
"""
__C.TRAIN.WEIGHTS = cache_url(__C.TRAIN.WEIGHTS, __C.DOWNLOAD_CACHE)
__C.TEST.WEIGHTS = cache_url(__C.TEST.WEIGHTS, __C.DOWNLOAD_CACHE)
__C.TRAIN.PROPOSAL_FILES = tuple(
[cache_url(f, __C.DOWNLOAD_CACHE) for f in __C.TRAIN.PROPOSAL_FILES]
)
__C.TEST.PROPOSAL_FILES = tuple(
[cache_url(f, __C.DOWNLOAD_CACHE) for f in __C.TEST.PROPOSAL_FILES]
)
def get_output_dir(training=True):
"""Get the output directory determined by the current global config."""
dataset = __C.TRAIN.DATASETS if training else __C.TEST.DATASETS
dataset = ':'.join(dataset)
tag = 'train' if training else 'test'
# <output-dir>/<train|test>/<dataset>/<model-type>/
outdir = osp.join(__C.OUTPUT_DIR, tag, dataset, __C.MODEL.TYPE)
if not osp.exists(outdir):
os.makedirs(outdir)
return outdir
def merge_cfg_from_file(cfg_filename):
"""Load a yaml config file and merge it into the global config."""
with open(cfg_filename, 'r') as f:
yaml_cfg = AttrDict(yaml.load(f))
_merge_a_into_b(yaml_cfg, __C)
def merge_cfg_from_cfg(cfg_other):
"""Merge `cfg_other` into the global config."""
_merge_a_into_b(cfg_other, __C)
def merge_cfg_from_list(cfg_list):
"""Merge config keys, values in a list (e.g., from command line) into the
global config. For example, `cfg_list = ['TEST.NMS', 0.5]`.
"""
assert len(cfg_list) % 2 == 0
for full_key, v in zip(cfg_list[0::2], cfg_list[1::2]):
if _key_is_deprecated(full_key):
continue
if _key_is_renamed(full_key):
_raise_key_rename_error(full_key)
key_list = full_key.split('.')
d = __C
for subkey in key_list[:-1]:
assert subkey in d, 'Non-existent key: {}'.format(full_key)
d = d[subkey]
subkey = key_list[-1]
assert subkey in d, 'Non-existent key: {}'.format(full_key)
value = _decode_cfg_value(v)
value = _check_and_coerce_cfg_value_type(
value, d[subkey], subkey, full_key
)
d[subkey] = value
def _merge_a_into_b(a, b, stack=None):
"""Merge config dictionary a into config dictionary b, clobbering the
options in b whenever they are also specified in a.
"""
assert isinstance(a, AttrDict), 'Argument `a` must be an AttrDict'
assert isinstance(b, AttrDict), 'Argument `b` must be an AttrDict'
for k, v_ in a.items():
full_key = '.'.join(stack) + '.' + k if stack is not None else k
# a must specify keys that are in b
if k not in b:
if _key_is_deprecated(full_key):
continue
elif _key_is_renamed(full_key):
_raise_key_rename_error(full_key)
else:
raise KeyError('Non-existent config key: {}'.format(full_key))
v = copy.deepcopy(v_)
v = _decode_cfg_value(v)
v = _check_and_coerce_cfg_value_type(v, b[k], k, full_key)
# Recursively merge dicts
if isinstance(v, AttrDict):
try:
stack_push = [k] if stack is None else stack + [k]
_merge_a_into_b(v, b[k], stack=stack_push)
except BaseException:
raise
else:
b[k] = v
def _key_is_deprecated(full_key):
if full_key in _DEPCRECATED_KEYS:
logger.warn(
'Deprecated config key (ignoring): {}'.format(full_key)
)
return True
return False
def _key_is_renamed(full_key):
return full_key in _RENAMED_KEYS
def _raise_key_rename_error(full_key):
new_key = _RENAMED_KEYS[full_key]
if isinstance(new_key, tuple):
msg = ' Note: ' + new_key[1]
new_key = new_key[0]
else:
msg = ''
raise KeyError(
'Key {} was renamed to {}; please update your config.{}'.
format(full_key, new_key, msg)
)
def _decode_cfg_value(v):
"""Decodes a raw config value (e.g., from a yaml config files or command
line argument) into a Python object.
"""
# Configs parsed from raw yaml will contain dictionary keys that need to be
# converted to AttrDict objects
if isinstance(v, dict):
return AttrDict(v)
# All remaining processing is only applied to strings
if not isinstance(v, basestring):
return v
# Try to interpret `v` as a:
# string, number, tuple, list, dict, boolean, or None
try:
v = literal_eval(v)
# The following two excepts allow v to pass through when it represents a
# string.
#
# Longer explanation:
# The type of v is always a string (before calling literal_eval), but
# sometimes it *represents* a string and other times a data structure, like
# a list. In the case that v represents a string, what we got back from the
# yaml parser is 'foo' *without quotes* (so, not '"foo"'). literal_eval is
# ok with '"foo"', but will raise a ValueError if given 'foo'. In other
# cases, like paths (v = 'foo/bar' and not v = '"foo/bar"'), literal_eval
# will raise a SyntaxError.
except ValueError:
pass
except SyntaxError:
pass
return v
def _check_and_coerce_cfg_value_type(value_a, value_b, key, full_key):
"""Checks that `value_a`, which is intended to replace `value_b` is of the
right type. The type is correct if it matches exactly or is one of a few
cases in which the type can be easily coerced.
"""
# The types must match (with some exceptions)
type_b = type(value_b)
type_a = type(value_a)
if type_a is type_b:
return value_a
# Exceptions: numpy arrays, strings, tuple<->list
if isinstance(value_b, np.ndarray):
value_a = np.array(value_a, dtype=value_b.dtype)
elif isinstance(value_b, basestring):
value_a = str(value_a)
elif isinstance(value_a, tuple) and isinstance(value_b, list):
value_a = list(value_a)
elif isinstance(value_a, list) and isinstance(value_b, tuple):
value_a = tuple(value_a)
else:
raise ValueError(
'Type mismatch ({} vs. {}) with values ({} vs. {}) for config '
'key: {}'.format(type_b, type_a, value_b, value_a, full_key)
)
return value_a
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
#
# Based on:
# --------------------------------------------------------
# Faster R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
"""Functions for RPN proposal generation."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import cv2
import datetime
import logging
import numpy as np
import os
import yaml
from caffe2.python import core
from caffe2.python import workspace
from core.config import cfg
from core.config import get_output_dir
from datasets import task_evaluation
from datasets.json_dataset import JsonDataset
from modeling import model_builder
from utils.blob import im_list_to_blob
from utils.io import save_object
from utils.timer import Timer
import utils.c2 as c2_utils
import utils.env as envu
import utils.net as nu
import utils.subprocess as subprocess_utils
logger = logging.getLogger(__name__)
def generate_rpn_on_dataset(multi_gpu=False):
"""Run inference on a dataset."""
output_dir = get_output_dir(training=False)
dataset = JsonDataset(cfg.TEST.DATASET)
test_timer = Timer()
test_timer.tic()
if multi_gpu:
num_images = len(dataset.get_roidb())
_boxes, _scores, _ids, rpn_file = multi_gpu_generate_rpn_on_dataset(
num_images, output_dir
)
else:
# Processes entire dataset range by default
_boxes, _scores, _ids, rpn_file = generate_rpn_on_range()
test_timer.toc()
logger.info('Total inference time: {:.3f}s'.format(test_timer.average_time))
return evaluate_proposal_file(dataset, rpn_file, output_dir)
def multi_gpu_generate_rpn_on_dataset(num_images, output_dir):
"""Multi-gpu inference on a dataset."""
# Retrieve the test_net binary path
binary_dir = envu.get_runtime_dir()
binary_ext = envu.get_py_bin_ext()
binary = os.path.join(binary_dir, 'test_net' + binary_ext)
assert os.path.exists(binary), 'Binary \'{}\' not found'.format(binary)
# Run inference in parallel in subprocesses
outputs = subprocess_utils.process_in_parallel(
'rpn_proposals', num_images, binary, output_dir
)
# Collate the results from each subprocess
boxes, scores, ids = [], [], []
for rpn_data in outputs:
boxes += rpn_data['boxes']
scores += rpn_data['scores']
ids += rpn_data['ids']
rpn_file = os.path.join(output_dir, 'rpn_proposals.pkl')
cfg_yaml = yaml.dump(cfg)
save_object(
dict(boxes=boxes, scores=scores, ids=ids, cfg=cfg_yaml), rpn_file
)
logger.info('Wrote RPN proposals to {}'.format(os.path.abspath(rpn_file)))
return boxes, scores, ids, rpn_file
def generate_rpn_on_range(ind_range=None):
"""Run inference on all images in a dataset or over an index range of images
in a dataset using a single GPU.
"""
assert cfg.TEST.WEIGHTS != '', \
'TEST.WEIGHTS must be set to the model file to test'
assert cfg.TEST.DATASET != '', \
'TEST.DATASET must be set to the dataset name to test'
assert cfg.MODEL.RPN_ONLY or cfg.MODEL.FASTER_RCNN
roidb, start_ind, end_ind, total_num_images = get_roidb(ind_range)
output_dir = get_output_dir(training=False)
logger.info(
'Output will be saved to: {:s}'.format(os.path.abspath(output_dir))
)
model = model_builder.create(cfg.MODEL.TYPE, train=False)
nu.initialize_from_weights_file(model, cfg.TEST.WEIGHTS)
model_builder.add_inference_inputs(model)
workspace.CreateNet(model.net)
boxes, scores, ids = generate_proposals_on_roidb(
model,
roidb,
start_ind=start_ind,
end_ind=end_ind,
total_num_images=total_num_images
)
cfg_yaml = yaml.dump(cfg)
if ind_range is not None:
rpn_name = 'rpn_proposals_range_%s_%s.pkl' % tuple(ind_range)
else:
rpn_name = 'rpn_proposals.pkl'
rpn_file = os.path.join(output_dir, rpn_name)
save_object(
dict(boxes=boxes, scores=scores, ids=ids, cfg=cfg_yaml), rpn_file
)
logger.info('Wrote RPN proposals to {}'.format(os.path.abspath(rpn_file)))
return boxes, scores, ids, rpn_file
def generate_proposals_on_roidb(
model, roidb, start_ind=None, end_ind=None, total_num_images=None
):
"""Generate RPN proposals on all images in an imdb."""
_t = Timer()
num_images = len(roidb)
roidb_boxes = [[] for _ in range(num_images)]
roidb_scores = [[] for _ in range(num_images)]
roidb_ids = [[] for _ in range(num_images)]
if start_ind is None:
start_ind = 0
end_ind = num_images
total_num_images = num_images
for i in range(num_images):
roidb_ids[i] = roidb[i]['id']
im = cv2.imread(roidb[i]['image'])
with c2_utils.NamedCudaScope(0):
_t.tic()
roidb_boxes[i], roidb_scores[i] = im_proposals(model, im)
_t.toc()
if i % 10 == 0:
ave_time = _t.average_time
eta_seconds = ave_time * (num_images - i - 1)
eta = str(datetime.timedelta(seconds=int(eta_seconds)))
logger.info(
(
'rpn_generate: range [{:d}, {:d}] of {:d}: '
'{:d}/{:d} {:.3f}s (eta: {})'
).format(
start_ind + 1, end_ind, total_num_images, start_ind + i + 1,
start_ind + num_images, ave_time, eta
)
)
return roidb_boxes, roidb_scores, roidb_ids
def im_proposals(model, im):
"""Generate RPN proposals on a single image."""
inputs = {}
inputs['data'], inputs['im_info'] = _get_image_blob(im)
scale = inputs['im_info'][0, 2]
for k, v in inputs.items():
workspace.FeedBlob(core.ScopedName(k), v.astype(np.float32, copy=False))
workspace.RunNet(model.net.Proto().name)
if cfg.FPN.FPN_ON and cfg.FPN.MULTILEVEL_RPN:
k_max = cfg.FPN.RPN_MAX_LEVEL
k_min = cfg.FPN.RPN_MIN_LEVEL
rois_names = [
core.ScopedName('rpn_rois_fpn' + str(l))
for l in range(k_min, k_max + 1)
]
score_names = [
core.ScopedName('rpn_roi_probs_fpn' + str(l))
for l in range(k_min, k_max + 1)
]
blobs = workspace.FetchBlobs(rois_names + score_names)
# Combine predictions across all levels and retain the top scoring
boxes = np.concatenate(blobs[:len(rois_names)])
scores = np.concatenate(blobs[len(rois_names):]).squeeze()
# Discussion: one could do NMS again after combining predictions from
# the different FPN levels. Conceptually, it's probably the right thing
# to do. For arbitrary reasons, the original FPN RPN implementation did
# not do another round of NMS.
inds = np.argsort(-scores)[:cfg.TEST.RPN_POST_NMS_TOP_N]
scores = scores[inds]
boxes = boxes[inds, :]
else:
boxes, scores = workspace.FetchBlobs(
[core.ScopedName('rpn_rois'),
core.ScopedName('rpn_roi_probs')]
)
scores = scores.squeeze()
# Column 0 is the batch index in the (batch ind, x1, y1, x2, y2) encoding,
# so we remove it since we just want to return boxes
# Scale proposals back to the original input image scale
boxes = boxes[:, 1:] / scale
return boxes, scores
def get_roidb(ind_range):
"""Get the roidb for the dataset specified in the global cfg. Optionally
restrict it to a range of indices if ind_range is a pair of integers.
"""
dataset = JsonDataset(cfg.TEST.DATASET)
roidb = dataset.get_roidb()
if ind_range is not None:
total_num_images = len(roidb)
start, end = ind_range
roidb = roidb[start:end]
else:
start = 0
end = len(roidb)
total_num_images = end
return roidb, start, end, total_num_images
def evaluate_proposal_file(dataset, proposal_file, output_dir):
"""Evaluate box proposal average recall."""
roidb = dataset.get_roidb(gt=True, proposal_file=proposal_file)
results = task_evaluation.evaluate_box_proposals(dataset, roidb)
task_evaluation.log_box_proposal_results(results)
recall_file = os.path.join(output_dir, 'rpn_proposal_recall.pkl')
save_object(results, recall_file)
return results
def _get_image_blob(im):
"""Converts an image into a network input.
Arguments:
im (ndarray): a color image in BGR order
Returns:
blob (ndarray): a data blob holding an image pyramid
im_scale_factors (list): list of image scales (relative to im) used
in the image pyramid
"""
im_orig = im.astype(np.float32, copy=True)
im_orig -= cfg.PIXEL_MEANS
im_shape = im_orig.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
processed_ims = []
assert len(cfg.TEST.SCALES) == 1
target_size = cfg.TEST.SCALES[0]
im_scale = float(target_size) / float(im_size_min)
# Prevent the biggest axis from being more than MAX_SIZE
if np.round(im_scale * im_size_max) > cfg.TEST.MAX_SIZE:
im_scale = float(cfg.TEST.MAX_SIZE) / float(im_size_max)
im = cv2.resize(im_orig, None, None, fx=im_scale, fy=im_scale,
interpolation=cv2.INTER_LINEAR)
im_info = np.hstack((im.shape[:2], im_scale))[np.newaxis, :]
processed_ims.append(im)
# Create a blob to hold the input images
blob = im_list_to_blob(processed_ims)
return blob, im_info
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
#
# Based on:
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
"""Inference functionality for most Detectron models."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from collections import defaultdict
import cv2
import logging
import numpy as np
from caffe2.python import core
from caffe2.python import workspace
import pycocotools.mask as mask_util
from core.config import cfg
from utils.timer import Timer
import modeling.FPN as fpn
import utils.blob as blob_utils
import utils.boxes as box_utils
import utils.image as image_utils
import utils.keypoints as keypoint_utils
logger = logging.getLogger(__name__)
def im_detect_all(model, im, box_proposals, timers=None):
if timers is None:
timers = defaultdict(Timer)
timers['im_detect_bbox'].tic()
if cfg.TEST.BBOX_AUG.ENABLED:
scores, boxes, im_scales = im_detect_bbox_aug(model, im, box_proposals)
else:
scores, boxes, im_scales = im_detect_bbox(model, im, box_proposals)
timers['im_detect_bbox'].toc()
# score and boxes are from the whole image after score thresholding and nms
# (they are not separated by class)
# cls_boxes boxes and scores are separated by class and in the format used
# for evaluating results
timers['misc_bbox'].tic()
scores, boxes, cls_boxes = box_results_with_nms_and_limit(scores, boxes)
timers['misc_bbox'].toc()
if cfg.MODEL.MASK_ON and boxes.shape[0] > 0:
timers['im_detect_mask'].tic()
if cfg.TEST.MASK_AUG.ENABLED:
masks = im_detect_mask_aug(model, im, boxes)
else:
masks = im_detect_mask(model, im_scales, boxes)
timers['im_detect_mask'].toc()
timers['misc_mask'].tic()
cls_segms = segm_results(
cls_boxes, masks, boxes, im.shape[0], im.shape[1]
)
timers['misc_mask'].toc()
else:
cls_segms = None
if cfg.MODEL.KEYPOINTS_ON and boxes.shape[0] > 0:
timers['im_detect_keypoints'].tic()
if cfg.TEST.KPS_AUG.ENABLED:
heatmaps = im_detect_keypoints_aug(model, im, boxes)
else:
heatmaps = im_detect_keypoints(model, im_scales, boxes)
timers['im_detect_keypoints'].toc()
timers['misc_keypoints'].tic()
cls_keyps = keypoint_results(cls_boxes, heatmaps, boxes)
timers['misc_keypoints'].toc()
else:
cls_keyps = None
return cls_boxes, cls_segms, cls_keyps
def im_conv_body_only(model, im):
"""Runs `model.conv_body_net` on the given image `im`."""
im_blob, im_scale_factors = _get_image_blob(im)
workspace.FeedBlob(core.ScopedName('data'), im_blob)
workspace.RunNet(model.conv_body_net.Proto().name)
return im_scale_factors
def im_detect_bbox(model, im, boxes=None):
"""Bounding box object detection for an image with given box proposals.
Arguments:
model (DetectionModelHelper): the detection model to use
im (ndarray): color image to test (in BGR order)
boxes (ndarray): R x 4 array of object proposals in 0-indexed
[x1, y1, x2, y2] format, or None if using RPN
Returns:
scores (ndarray): R x K array of object class scores for K classes
(K includes background as object category 0)
boxes (ndarray): R x 4*K array of predicted bounding boxes
im_scales (list): list of image scales used in the input blob (as
returned by _get_blobs and for use with im_detect_mask, etc.)
"""
inputs, im_scales = _get_blobs(im, boxes)
# When mapping from image ROIs to feature map ROIs, there's some aliasing
# (some distinct image ROIs get mapped to the same feature ROI).
# Here, we identify duplicate feature ROIs, so we only compute features
# on the unique subset.
if cfg.DEDUP_BOXES > 0 and not cfg.MODEL.FASTER_RCNN:
v = np.array([1, 1e3, 1e6, 1e9, 1e12])
hashes = np.round(inputs['rois'] * cfg.DEDUP_BOXES).dot(v)
_, index, inv_index = np.unique(
hashes, return_index=True, return_inverse=True
)
inputs['rois'] = inputs['rois'][index, :]
boxes = boxes[index, :]
# Add multi-level rois for FPN
if cfg.FPN.MULTILEVEL_ROIS and not cfg.MODEL.FASTER_RCNN:
_add_multilevel_rois_for_test(inputs, 'rois')
for k, v in inputs.items():
workspace.FeedBlob(core.ScopedName(k), v)
workspace.RunNet(model.net.Proto().name)
# Read out blobs
if cfg.MODEL.FASTER_RCNN:
assert len(im_scales) == 1, \
'Only single-image / single-scale batch implemented'
rois = workspace.FetchBlob(core.ScopedName('rois'))
# unscale back to raw image space
boxes = rois[:, 1:5] / im_scales[0]
# Softmax class probabilities
scores = workspace.FetchBlob(core.ScopedName('cls_prob')).squeeze()
# In case there is 1 proposal
scores = scores.reshape([-1, scores.shape[-1]])
if cfg.TEST.BBOX_REG:
# Apply bounding-box regression deltas
box_deltas = workspace.FetchBlob(core.ScopedName('bbox_pred')).squeeze()
# In case there is 1 proposal
box_deltas = box_deltas.reshape([-1, box_deltas.shape[-1]])
if cfg.MODEL.CLS_AGNOSTIC_BBOX_REG:
# Remove predictions for bg class (compat with MSRA code)
box_deltas = box_deltas[:, -4:]
pred_boxes = box_utils.bbox_transform(
boxes, box_deltas, cfg.MODEL.BBOX_REG_WEIGHTS
)
pred_boxes = box_utils.clip_tiled_boxes(pred_boxes, im.shape)
if cfg.MODEL.CLS_AGNOSTIC_BBOX_REG:
pred_boxes = np.tile(pred_boxes, (1, scores.shape[1]))
else:
# Simply repeat the boxes, once for each class
pred_boxes = np.tile(boxes, (1, scores.shape[1]))
if cfg.DEDUP_BOXES > 0 and not cfg.MODEL.FASTER_RCNN:
# Map scores and predictions back to the original set of boxes
scores = scores[inv_index, :]
pred_boxes = pred_boxes[inv_index, :]
return scores, pred_boxes, im_scales
def im_detect_bbox_aug(model, im, box_proposals=None):
"""Performs bbox detection with test-time augmentations.
Function signature is the same as for im_detect_bbox.
"""
assert not cfg.TEST.BBOX_AUG.SCALE_SIZE_DEP, \
'Size dependent scaling not implemented'
assert not cfg.TEST.BBOX_AUG.SCORE_HEUR == 'UNION' or \
cfg.TEST.BBOX_AUG.COORD_HEUR == 'UNION', \
'Coord heuristic must be union whenever score heuristic is union'
assert not cfg.TEST.BBOX_AUG.COORD_HEUR == 'UNION' or \
cfg.TEST.BBOX_AUG.SCORE_HEUR == 'UNION', \
'Score heuristic must be union whenever coord heuristic is union'
assert not cfg.MODEL.FASTER_RCNN or \
cfg.TEST.BBOX_AUG.SCORE_HEUR == 'UNION', \
'Union heuristic must be used to combine Faster RCNN predictions'
# Collect detections computed under different transformations
scores_ts = []
boxes_ts = []
def add_preds_t(scores_t, boxes_t):
scores_ts.append(scores_t)
boxes_ts.append(boxes_t)
# Perform detection on the horizontally flipped image
if cfg.TEST.BBOX_AUG.H_FLIP:
scores_hf, boxes_hf, _im_scales_hf = im_detect_bbox_hflip(
model, im, box_proposals
)
add_preds_t(scores_hf, boxes_hf)
# Compute detections at different scales
for scale in cfg.TEST.BBOX_AUG.SCALES:
max_size = cfg.TEST.BBOX_AUG.MAX_SIZE
scores_scl, boxes_scl = im_detect_bbox_scale(
model, im, scale, max_size, box_proposals
)
add_preds_t(scores_scl, boxes_scl)
if cfg.TEST.BBOX_AUG.SCALE_H_FLIP:
scores_scl_hf, boxes_scl_hf = im_detect_bbox_scale(
model, im, scale, max_size, box_proposals, hflip=True
)
add_preds_t(scores_scl_hf, boxes_scl_hf)
# Perform detection at different aspect ratios
for aspect_ratio in cfg.TEST.BBOX_AUG.ASPECT_RATIOS:
scores_ar, boxes_ar = im_detect_bbox_aspect_ratio(
model, im, aspect_ratio, box_proposals
)
add_preds_t(scores_ar, boxes_ar)
if cfg.TEST.BBOX_AUG.ASPECT_RATIO_H_FLIP:
scores_ar_hf, boxes_ar_hf = im_detect_bbox_aspect_ratio(
model, im, aspect_ratio, box_proposals, hflip=True
)
add_preds_t(scores_ar_hf, boxes_ar_hf)
# Compute detections for the original image (identity transform) last to
# ensure that the Caffe2 workspace is populated with blobs corresponding
# to the original image on return (postcondition of im_detect_bbox)
scores_i, boxes_i, im_scales_i = im_detect_bbox(model, im, box_proposals)
add_preds_t(scores_i, boxes_i)
# Combine the predicted scores
if cfg.TEST.BBOX_AUG.SCORE_HEUR == 'ID':
scores_c = scores_i
elif cfg.TEST.BBOX_AUG.SCORE_HEUR == 'AVG':
scores_c = np.mean(scores_ts, axis=0)
elif cfg.TEST.BBOX_AUG.SCORE_HEUR == 'UNION':
scores_c = np.vstack(scores_ts)
else:
raise NotImplementedError(
'Score heur {} not supported'.format(cfg.TEST.BBOX_AUG.SCORE_HEUR)
)
# Combine the predicted boxes
if cfg.TEST.BBOX_AUG.COORD_HEUR == 'ID':
boxes_c = boxes_i
elif cfg.TEST.BBOX_AUG.COORD_HEUR == 'AVG':
boxes_c = np.mean(boxes_ts, axis=0)
elif cfg.TEST.BBOX_AUG.COORD_HEUR == 'UNION':
boxes_c = np.vstack(boxes_ts)
else:
raise NotImplementedError(
'Coord heur {} not supported'.format(cfg.TEST.BBOX_AUG.COORD_HEUR)
)
return scores_c, boxes_c, im_scales_i
def im_detect_bbox_hflip(model, im, box_proposals=None):
"""Performs bbox detection on the horizontally flipped image.
Function signature is the same as for im_detect_bbox.
"""
# Compute predictions on the flipped image
im_hf = im[:, ::-1, :]
im_width = im.shape[1]
if not cfg.MODEL.FASTER_RCNN:
box_proposals_hf = box_utils.flip_boxes(box_proposals, im_width)
else:
box_proposals_hf = None
scores_hf, boxes_hf, im_scales = im_detect_bbox(
model, im_hf, box_proposals_hf
)
# Invert the detections computed on the flipped image
boxes_inv = box_utils.flip_boxes(boxes_hf, im_width)
return scores_hf, boxes_inv, im_scales
def im_detect_bbox_scale(
model, im, scale, max_size, box_proposals=None, hflip=False
):
"""Computes bbox detections at the given scale.
Returns predictions in the original image space.
"""
# Remember the original scale
orig_scales = cfg.TEST.SCALES
orig_max_size = cfg.TEST.MAX_SIZE
# Perform detection at the given scale
cfg.TEST.SCALES = (scale, )
cfg.TEST.MAX_SIZE = max_size
if hflip:
scores_scl, boxes_scl, _ = im_detect_bbox_hflip(
model, im, box_proposals
)
else:
scores_scl, boxes_scl, _ = im_detect_bbox(model, im, box_proposals)
# Restore the original scale
cfg.TEST.SCALES = orig_scales
cfg.TEST.MAX_SIZE = orig_max_size
return scores_scl, boxes_scl
def im_detect_bbox_aspect_ratio(
model, im, aspect_ratio, box_proposals=None, hflip=False
):
"""Computes bbox detections at the given width-relative aspect ratio.
Returns predictions in the original image space.
"""
# Compute predictions on the transformed image
im_ar = image_utils.aspect_ratio_rel(im, aspect_ratio)
if not cfg.MODEL.FASTER_RCNN:
box_proposals_ar = box_utils.aspect_ratio(box_proposals, aspect_ratio)
else:
box_proposals_ar = None
if hflip:
scores_ar, boxes_ar, _ = im_detect_bbox_hflip(
model, im_ar, box_proposals_ar
)
else:
scores_ar, boxes_ar, _ = im_detect_bbox(model, im_ar, box_proposals_ar)
# Invert the detected boxes
boxes_inv = box_utils.aspect_ratio(boxes_ar, 1.0 / aspect_ratio)
return scores_ar, boxes_inv
def im_detect_mask(model, im_scales, boxes):
"""Infer instance segmentation masks. This function must be called after
im_detect_bbox as it assumes that the Caffe2 workspace is already populated
with the necessary blobs.
Arguments:
model (DetectionModelHelper): the detection model to use
im_scales (list): image blob scales as returned by im_detect_bbox
boxes (ndarray): R x 4 array of bounding box detections (e.g., as
returned by im_detect_bbox)
Returns:
pred_masks (ndarray): R x K x M x M array of class specific soft masks
output by the network (must be processed by segm_results to convert
into hard masks in the original image coordinate space)
"""
assert len(im_scales) == 1, \
'Only single-image / single-scale batch implemented'
M = cfg.MRCNN.RESOLUTION
if boxes.shape[0] == 0:
pred_masks = np.zeros((0, M, M), np.float32)
return pred_masks
inputs = {'mask_rois': _get_rois_blob(boxes, im_scales)}
# Add multi-level rois for FPN
if cfg.FPN.MULTILEVEL_ROIS:
_add_multilevel_rois_for_test(inputs, 'mask_rois')
for k, v in inputs.items():
workspace.FeedBlob(core.ScopedName(k), v)
workspace.RunNet(model.mask_net.Proto().name)
# Fetch masks
pred_masks = workspace.FetchBlob(
core.ScopedName('mask_fcn_probs')
).squeeze()
if cfg.MRCNN.CLS_SPECIFIC_MASK:
pred_masks = pred_masks.reshape([-1, cfg.MODEL.NUM_CLASSES, M, M])
else:
pred_masks = pred_masks.reshape([-1, 1, M, M])
return pred_masks
def im_detect_mask_aug(model, im, boxes):
"""Performs mask detection with test-time augmentations.
Arguments:
model (DetectionModelHelper): the detection model to use
im (ndarray): BGR image to test
boxes (ndarray): R x 4 array of bounding boxes
Returns:
masks (ndarray): R x K x M x M array of class specific soft masks
"""
assert not cfg.TEST.MASK_AUG.SCALE_SIZE_DEP, \
'Size dependent scaling not implemented'
# Collect masks computed under different transformations
masks_ts = []
# Compute masks for the original image (identity transform)
im_scales_i = im_conv_body_only(model, im)
masks_i = im_detect_mask(model, im_scales_i, boxes)
masks_ts.append(masks_i)
# Perform mask detection on the horizontally flipped image
if cfg.TEST.MASK_AUG.H_FLIP:
masks_hf = im_detect_mask_hflip(model, im, boxes)
masks_ts.append(masks_hf)
# Compute detections at different scales
for scale in cfg.TEST.MASK_AUG.SCALES:
max_size = cfg.TEST.MASK_AUG.MAX_SIZE
masks_scl = im_detect_mask_scale(model, im, scale, max_size, boxes)
masks_ts.append(masks_scl)
if cfg.TEST.MASK_AUG.SCALE_H_FLIP:
masks_scl_hf = im_detect_mask_scale(
model, im, scale, max_size, boxes, hflip=True
)
masks_ts.append(masks_scl_hf)
# Compute masks at different aspect ratios
for aspect_ratio in cfg.TEST.MASK_AUG.ASPECT_RATIOS:
masks_ar = im_detect_mask_aspect_ratio(model, im, aspect_ratio, boxes)
masks_ts.append(masks_ar)
if cfg.TEST.MASK_AUG.ASPECT_RATIO_H_FLIP:
masks_ar_hf = im_detect_mask_aspect_ratio(
model, im, aspect_ratio, boxes, hflip=True
)
masks_ts.append(masks_ar_hf)
# Combine the predicted soft masks
if cfg.TEST.MASK_AUG.HEUR == 'SOFT_AVG':
masks_c = np.mean(masks_ts, axis=0)
elif cfg.TEST.MASK_AUG.HEUR == 'SOFT_MAX':
masks_c = np.amax(masks_ts, axis=0)
elif cfg.TEST.MASK_AUG.HEUR == 'LOGIT_AVG':
def logit(y):
return -1.0 * np.log((1.0 - y) / np.maximum(y, 1e-20))
logit_masks = [logit(y) for y in masks_ts]
logit_masks = np.mean(logit_masks, axis=0)
masks_c = 1.0 / (1.0 + np.exp(-logit_masks))
else:
raise NotImplementedError(
'Heuristic {} not supported'.format(cfg.TEST.MASK_AUG.HEUR)
)
return masks_c
def im_detect_mask_hflip(model, im, boxes):
"""Performs mask detection on the horizontally flipped image.
Function signature is the same as for im_detect_mask_aug.
"""
# Compute the masks for the flipped image
im_hf = im[:, ::-1, :]
boxes_hf = box_utils.flip_boxes(boxes, im.shape[1])
im_scales = im_conv_body_only(model, im_hf)
masks_hf = im_detect_mask(model, im_scales, boxes_hf)
# Invert the predicted soft masks
masks_inv = masks_hf[:, :, :, ::-1]
return masks_inv
def im_detect_mask_scale(model, im, scale, max_size, boxes, hflip=False):
"""Computes masks at the given scale."""
# Remember the original scale
orig_scales = cfg.TEST.SCALES
orig_max_size = cfg.TEST.MAX_SIZE
# Perform mask detection at the given scale
cfg.TEST.SCALES = (scale, )
cfg.TEST.MAX_SIZE = max_size
if hflip:
masks_scl = im_detect_mask_hflip(model, im, boxes)
else:
im_scales = im_conv_body_only(model, im)
masks_scl = im_detect_mask(model, im_scales, boxes)
# Restore the original scale
cfg.TEST.SCALES = orig_scales
cfg.TEST.MAX_SIZE = orig_max_size
return masks_scl
def im_detect_mask_aspect_ratio(model, im, aspect_ratio, boxes, hflip=False):
"""Computes mask detections at the given width-relative aspect ratio."""
# Perform mask detection on the transformed image
im_ar = image_utils.aspect_ratio_rel(im, aspect_ratio)
boxes_ar = box_utils.aspect_ratio(boxes, aspect_ratio)
if hflip:
masks_ar = im_detect_mask_hflip(model, im_ar, boxes_ar)
else:
im_scales = im_conv_body_only(model, im_ar)
masks_ar = im_detect_mask(model, im_scales, boxes_ar)
return masks_ar
def im_detect_keypoints(model, im_scales, boxes):
"""Infer instance keypoint poses. This function must be called after
im_detect_bbox as it assumes that the Caffe2 workspace is already populated
with the necessary blobs.
Arguments:
model (DetectionModelHelper): the detection model to use
im_scales (list): image blob scales as returned by im_detect_bbox
boxes (ndarray): R x 4 array of bounding box detections (e.g., as
returned by im_detect_bbox)
Returns:
pred_heatmaps (ndarray): R x J x M x M array of keypoint location
logits (softmax inputs) for each of the J keypoint types output
by the network (must be processed by keypoint_results to convert
into point predictions in the original image coordinate space)
"""
assert len(im_scales) == 1, \
'Only single-image / single-scale batch implemented'
M = cfg.KRCNN.HEATMAP_SIZE
if boxes.shape[0] == 0:
pred_heatmaps = np.zeros((0, cfg.KRCNN.NUM_KEYPOINTS, M, M), np.float32)
return pred_heatmaps
inputs = {'keypoint_rois': _get_rois_blob(boxes, im_scales)}
# Add multi-level rois for FPN
if cfg.FPN.MULTILEVEL_ROIS:
_add_multilevel_rois_for_test(inputs, 'keypoint_rois')
for k, v in inputs.items():
workspace.FeedBlob(core.ScopedName(k), v)
workspace.RunNet(model.keypoint_net.Proto().name)
pred_heatmaps = workspace.FetchBlob(core.ScopedName('kps_score')).squeeze()
# In case of 1
if pred_heatmaps.ndim == 3:
pred_heatmaps = np.expand_dims(pred_heatmaps, axis=0)
return pred_heatmaps
def im_detect_keypoints_aug(model, im, boxes):
"""Computes keypoint predictions with test-time augmentations.
Arguments:
model (DetectionModelHelper): the detection model to use
im (ndarray): BGR image to test
boxes (ndarray): R x 4 array of bounding boxes
Returns:
heatmaps (ndarray): R x J x M x M array of keypoint location logits
"""
# Collect heatmaps predicted under different transformations
heatmaps_ts = []
# Tag predictions computed under downscaling and upscaling transformations
ds_ts = []
us_ts = []
def add_heatmaps_t(heatmaps_t, ds_t=False, us_t=False):
heatmaps_ts.append(heatmaps_t)
ds_ts.append(ds_t)
us_ts.append(us_t)
# Compute the heatmaps for the original image (identity transform)
im_scales = im_conv_body_only(model, im)
heatmaps_i = im_detect_keypoints(model, im_scales, boxes)
add_heatmaps_t(heatmaps_i)
# Perform keypoints detection on the horizontally flipped image
if cfg.TEST.KPS_AUG.H_FLIP:
heatmaps_hf = im_detect_keypoints_hflip(model, im, boxes)
add_heatmaps_t(heatmaps_hf)
# Compute detections at different scales
for scale in cfg.TEST.KPS_AUG.SCALES:
ds_scl = scale < cfg.TEST.SCALES[0]
us_scl = scale > cfg.TEST.SCALES[0]
heatmaps_scl = im_detect_keypoints_scale(
model, im, scale, cfg.TEST.KPS_AUG.MAX_SIZE, boxes
)
add_heatmaps_t(heatmaps_scl, ds_scl, us_scl)
if cfg.TEST.KPS_AUG.SCALE_H_FLIP:
heatmaps_scl_hf = im_detect_keypoints_scale(
model, im, scale, cfg.TEST.KPS_AUG.MAX_SIZE, boxes, hflip=True
)
add_heatmaps_t(heatmaps_scl_hf, ds_scl, us_scl)
# Compute keypoints at different aspect ratios
for aspect_ratio in cfg.TEST.KPS_AUG.ASPECT_RATIOS:
heatmaps_ar = im_detect_keypoints_aspect_ratio(
model, im, aspect_ratio, boxes
)
add_heatmaps_t(heatmaps_ar)
if cfg.TEST.KPS_AUG.ASPECT_RATIO_H_FLIP:
heatmaps_ar_hf = im_detect_keypoints_aspect_ratio(
model, im, aspect_ratio, boxes, hflip=True
)
add_heatmaps_t(heatmaps_ar_hf)
# Select the heuristic function for combining the heatmaps
if cfg.TEST.KPS_AUG.HEUR == 'HM_AVG':
np_f = np.mean
elif cfg.TEST.KPS_AUG.HEUR == 'HM_MAX':
np_f = np.amax
else:
raise NotImplementedError(
'Heuristic {} not supported'.format(cfg.TEST.KPS_AUG.HEUR)
)
def heur_f(hms_ts):
return np_f(hms_ts, axis=0)
# Combine the heatmaps
if cfg.TEST.KPS_AUG.SCALE_SIZE_DEP:
heatmaps_c = combine_heatmaps_size_dep(
heatmaps_ts, ds_ts, us_ts, boxes, heur_f
)
else:
heatmaps_c = heur_f(heatmaps_ts)
return heatmaps_c
def im_detect_keypoints_hflip(model, im, boxes):
"""Computes keypoint predictions on the horizontally flipped image.
Function signature is the same as for im_detect_keypoints_aug.
"""
# Compute keypoints for the flipped image
im_hf = im[:, ::-1, :]
boxes_hf = box_utils.flip_boxes(boxes, im.shape[1])
im_scales = im_conv_body_only(model, im_hf)
heatmaps_hf = im_detect_keypoints(model, im_scales, boxes_hf)
# Invert the predicted keypoints
heatmaps_inv = keypoint_utils.flip_heatmaps(heatmaps_hf)
return heatmaps_inv
def im_detect_keypoints_scale(model, im, scale, max_size, boxes, hflip=False):
"""Computes keypoint predictions at the given scale."""
# Store the original scale
orig_scales = cfg.TEST.SCALES
orig_max_size = cfg.TEST.MAX_SIZE
# Perform detection at the given scale
cfg.TEST.SCALES = (scale, )
cfg.TEST.MAX_SIZE = max_size
if hflip:
heatmaps_scl = im_detect_keypoints_hflip(model, im, boxes)
else:
im_scales = im_conv_body_only(model, im)
heatmaps_scl = im_detect_keypoints(model, im_scales, boxes)
# Restore the original scale
cfg.TEST.SCALES = orig_scales
cfg.TEST.MAX_SIZE = orig_max_size
return heatmaps_scl
def im_detect_keypoints_aspect_ratio(
model, im, aspect_ratio, boxes, hflip=False
):
"""Detects keypoints at the given width-relative aspect ratio."""
# Perform keypoint detectionon the transformed image
im_ar = image_utils.aspect_ratio_rel(im, aspect_ratio)
boxes_ar = box_utils.aspect_ratio(boxes, aspect_ratio)
if hflip:
heatmaps_ar = im_detect_keypoints_hflip(model, im_ar, boxes_ar)
else:
im_scales = im_conv_body_only(model, im_ar)
heatmaps_ar = im_detect_keypoints(model, im_scales, boxes_ar)
return heatmaps_ar
def combine_heatmaps_size_dep(hms_ts, ds_ts, us_ts, boxes, heur_f):
"""Combines heatmaps while taking object sizes into account."""
assert len(hms_ts) == len(ds_ts) and len(ds_ts) == len(us_ts), \
'All sets of hms must be tagged with downscaling and upscaling flags'
# Classify objects into small+medium and large based on their box areas
areas = box_utils.boxes_area(boxes)
sm_objs = areas < cfg.TEST.KPS_AUG.AREA_TH
l_objs = areas >= cfg.TEST.KPS_AUG.AREA_TH
# Combine heatmaps computed under different transformations for each object
hms_c = np.zeros_like(hms_ts[0])
for i in range(hms_c.shape[0]):
hms_to_combine = []
for hms_t, ds_t, us_t in zip(hms_ts, ds_ts, us_ts):
# Discard downscaling predictions for small and medium objects
if sm_objs[i] and ds_t:
continue
# Discard upscaling predictions for large objects
if l_objs[i] and us_t:
continue
hms_to_combine.append(hms_t[i])
hms_c[i] = heur_f(hms_to_combine)
return hms_c
def box_results_with_nms_and_limit(scores, boxes):
"""Returns bounding-box detection results by thresholding on scores and
applying non-maximum suppression (NMS).
`boxes` has shape (#detections, 4 * #classes), where each row represents
a list of predicted bounding boxes for each of the object classes in the
dataset (including the background class). The detections in each row
originate from the same object proposal.
`scores` has shape (#detection, #classes), where each row represents a list
of object detection confidence scores for each of the object classes in the
dataset (including the background class). `scores[i, j]`` corresponds to the
box at `boxes[i, j * 4:(j + 1) * 4]`.
"""
num_classes = cfg.MODEL.NUM_CLASSES
cls_boxes = [[] for _ in range(num_classes)]
# Apply threshold on detection probabilities and apply NMS
# Skip j = 0, because it's the background class
for j in range(1, num_classes):
inds = np.where(scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
scores_j = scores[inds, j]
boxes_j = boxes[inds, j * 4:(j + 1) * 4]
dets_j = np.hstack((boxes_j, scores_j[:, np.newaxis])).astype(
np.float32, copy=False
)
if cfg.TEST.SOFT_NMS.ENABLED:
nms_dets, _ = box_utils.soft_nms(
dets_j,
sigma=cfg.TEST.SOFT_NMS.SIGMA,
overlap_thresh=cfg.TEST.NMS,
score_thresh=0.0001,
method=cfg.TEST.SOFT_NMS.METHOD
)
else:
keep = box_utils.nms(dets_j, cfg.TEST.NMS)
nms_dets = dets_j[keep, :]
# Refine the post-NMS boxes using bounding-box voting
if cfg.TEST.BBOX_VOTE.ENABLED:
nms_dets = box_utils.box_voting(
nms_dets,
dets_j,
cfg.TEST.BBOX_VOTE.VOTE_TH,
scoring_method=cfg.TEST.BBOX_VOTE.SCORING_METHOD
)
cls_boxes[j] = nms_dets
# Limit to max_per_image detections **over all classes**
if cfg.TEST.DETECTIONS_PER_IM > 0:
image_scores = np.hstack(
[cls_boxes[j][:, -1] for j in range(1, num_classes)]
)
if len(image_scores) > cfg.TEST.DETECTIONS_PER_IM:
image_thresh = np.sort(image_scores)[-cfg.TEST.DETECTIONS_PER_IM]
for j in range(1, num_classes):
keep = np.where(cls_boxes[j][:, -1] >= image_thresh)[0]
cls_boxes[j] = cls_boxes[j][keep, :]
im_results = np.vstack([cls_boxes[j] for j in range(1, num_classes)])
boxes = im_results[:, :-1]
scores = im_results[:, -1]
return scores, boxes, cls_boxes
def segm_results(cls_boxes, masks, ref_boxes, im_h, im_w):
num_classes = cfg.MODEL.NUM_CLASSES
cls_segms = [[] for _ in range(num_classes)]
mask_ind = 0
# To work around an issue with cv2.resize (it seems to automatically pad
# with repeated border values), we manually zero-pad the masks by 1 pixel
# prior to resizing back to the original image resolution. This prevents
# "top hat" artifacts. We therefore need to expand the reference boxes by an
# appropriate factor.
M = cfg.MRCNN.RESOLUTION
scale = (M + 2.0) / M
ref_boxes = box_utils.expand_boxes(ref_boxes, scale)
ref_boxes = ref_boxes.astype(np.int32)
padded_mask = np.zeros((M + 2, M + 2), dtype=np.float32)
# skip j = 0, because it's the background class
for j in range(1, num_classes):
segms = []
for _ in range(cls_boxes[j].shape[0]):
if cfg.MRCNN.CLS_SPECIFIC_MASK:
padded_mask[1:-1, 1:-1] = masks[mask_ind, j, :, :]
else:
padded_mask[1:-1, 1:-1] = masks[mask_ind, 0, :, :]
ref_box = ref_boxes[mask_ind, :]
w = ref_box[2] - ref_box[0] + 1
h = ref_box[3] - ref_box[1] + 1
w = np.maximum(w, 1)
h = np.maximum(h, 1)
mask = cv2.resize(padded_mask, (w, h))
mask = np.array(mask > cfg.MRCNN.THRESH_BINARIZE, dtype=np.uint8)
im_mask = np.zeros((im_h, im_w), dtype=np.uint8)
x_0 = max(ref_box[0], 0)
x_1 = min(ref_box[2] + 1, im_w)
y_0 = max(ref_box[1], 0)
y_1 = min(ref_box[3] + 1, im_h)
im_mask[y_0:y_1, x_0:x_1] = mask[
(y_0 - ref_box[1]):(y_1 - ref_box[1]),
(x_0 - ref_box[0]):(x_1 - ref_box[0])
]
# Get RLE encoding used by the COCO evaluation API
rle = mask_util.encode(
np.array(im_mask[:, :, np.newaxis], order='F')
)[0]
segms.append(rle)
mask_ind += 1
cls_segms[j] = segms
assert mask_ind == masks.shape[0]
return cls_segms
def keypoint_results(cls_boxes, pred_heatmaps, ref_boxes):
num_classes = cfg.MODEL.NUM_CLASSES
cls_keyps = [[] for _ in range(num_classes)]
person_idx = keypoint_utils.get_person_class_index()
xy_preds = keypoint_utils.heatmaps_to_keypoints(pred_heatmaps, ref_boxes)
# NMS OKS
if cfg.KRCNN.NMS_OKS:
keep = keypoint_utils.nms_oks(xy_preds, ref_boxes, 0.3)
xy_preds = xy_preds[keep, :, :]
ref_boxes = ref_boxes[keep, :]
pred_heatmaps = pred_heatmaps[keep, :, :, :]
cls_boxes[person_idx] = cls_boxes[person_idx][keep, :]
kps = [xy_preds[i] for i in range(xy_preds.shape[0])]
cls_keyps[person_idx] = kps
return cls_keyps
def _get_image_blob(im):
"""Converts an image into a network input.
Arguments:
im (ndarray): a color image in BGR order
Returns:
blob (ndarray): a data blob holding an image pyramid
im_scale_factors (ndarray): array of image scales (relative to im) used
in the image pyramid
"""
processed_ims, im_scale_factors = blob_utils.prep_im_for_blob(
im, cfg.PIXEL_MEANS, cfg.TEST.SCALES, cfg.TEST.MAX_SIZE
)
blob = blob_utils.im_list_to_blob(processed_ims)
return blob, np.array(im_scale_factors)
def _get_rois_blob(im_rois, im_scale_factors):
"""Converts RoIs into network inputs.
Arguments:
im_rois (ndarray): R x 4 matrix of RoIs in original image coordinates
im_scale_factors (list): scale factors as returned by _get_image_blob
Returns:
blob (ndarray): R x 5 matrix of RoIs in the image pyramid with columns
[level, x1, y1, x2, y2]
"""
rois, levels = _project_im_rois(im_rois, im_scale_factors)
rois_blob = np.hstack((levels, rois))
return rois_blob.astype(np.float32, copy=False)
def _project_im_rois(im_rois, scales):
"""Project image RoIs into the image pyramid built by _get_image_blob.
Arguments:
im_rois (ndarray): R x 4 matrix of RoIs in original image coordinates
scales (list): scale factors as returned by _get_image_blob
Returns:
rois (ndarray): R x 4 matrix of projected RoI coordinates
levels (ndarray): image pyramid levels used by each projected RoI
"""
im_rois = im_rois.astype(np.float, copy=False)
if len(scales) > 1:
widths = im_rois[:, 2] - im_rois[:, 0] + 1
heights = im_rois[:, 3] - im_rois[:, 1] + 1
areas = widths * heights
scaled_areas = areas[:, np.newaxis] * (scales[np.newaxis, :]**2)
diff_areas = np.abs(scaled_areas - 224 * 224)
levels = diff_areas.argmin(axis=1)[:, np.newaxis]
else:
levels = np.zeros((im_rois.shape[0], 1), dtype=np.int)
rois = im_rois * scales[levels]
return rois, levels
def _add_multilevel_rois_for_test(blobs, name):
"""Distributes a set of RoIs across FPN pyramid levels by creating new level
specific RoI blobs.
Arguments:
blobs (dict): dictionary of blobs
name (str): a key in 'blobs' identifying the source RoI blob
Returns:
[by ref] blobs (dict): new keys named by `name + 'fpn' + level`
are added to dict each with a value that's an R_level x 5 ndarray of
RoIs (see _get_rois_blob for format)
"""
lvl_min = cfg.FPN.ROI_MIN_LEVEL
lvl_max = cfg.FPN.ROI_MAX_LEVEL
lvls = fpn.map_rois_to_fpn_levels(blobs[name][:, 1:5], lvl_min, lvl_max)
fpn.add_multilevel_roi_blobs(
blobs, name, blobs[name], lvls, lvl_min, lvl_max
)
def _get_blobs(im, rois):
"""Convert an image and RoIs within that image into network inputs."""
blobs = {}
blobs['data'], im_scale_factors = _get_image_blob(im)
if cfg.MODEL.FASTER_RCNN and rois is None:
height, width = blobs['data'].shape[2], blobs['data'].shape[3]
scale = im_scale_factors[0]
blobs['im_info'] = np.array([[height, width, scale]], dtype=np.float32)
if rois is not None:
blobs['rois'] = _get_rois_blob(rois, im_scale_factors)
return blobs, im_scale_factors
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Test a Detectron network on an imdb (image database)."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from collections import defaultdict
import cv2
import datetime
import logging
import numpy as np
import os
import yaml
from caffe2.python import workspace
from core.config import cfg
from core.config import get_output_dir
from core.test import im_detect_all
from datasets import task_evaluation
from datasets.json_dataset import JsonDataset
from modeling import model_builder
from utils.io import save_object
from utils.timer import Timer
import utils.c2 as c2_utils
import utils.env as envu
import utils.net as net_utils
import utils.subprocess as subprocess_utils
import utils.vis as vis_utils
logger = logging.getLogger(__name__)
def test_net_on_dataset(multi_gpu=False):
"""Run inference on a dataset."""
output_dir = get_output_dir(training=False)
dataset = JsonDataset(cfg.TEST.DATASET)
test_timer = Timer()
test_timer.tic()
if multi_gpu:
num_images = len(dataset.get_roidb())
all_boxes, all_segms, all_keyps = multi_gpu_test_net_on_dataset(
num_images, output_dir
)
else:
all_boxes, all_segms, all_keyps = test_net()
test_timer.toc()
logger.info('Total inference time: {:.3f}s'.format(test_timer.average_time))
results = task_evaluation.evaluate_all(
dataset, all_boxes, all_segms, all_keyps, output_dir
)
return results
def multi_gpu_test_net_on_dataset(num_images, output_dir):
"""Multi-gpu inference on a dataset."""
binary_dir = envu.get_runtime_dir()
binary_ext = envu.get_py_bin_ext()
binary = os.path.join(binary_dir, 'test_net' + binary_ext)
assert os.path.exists(binary), 'Binary \'{}\' not found'.format(binary)
# Run inference in parallel in subprocesses
# Outputs will be a list of outputs from each subprocess, where the output
# of each subprocess is the dictionary saved by test_net().
outputs = subprocess_utils.process_in_parallel(
'detection', num_images, binary, output_dir
)
# Collate the results from each subprocess
all_boxes = [[] for _ in range(cfg.MODEL.NUM_CLASSES)]
all_segms = [[] for _ in range(cfg.MODEL.NUM_CLASSES)]
all_keyps = [[] for _ in range(cfg.MODEL.NUM_CLASSES)]
for det_data in outputs:
all_boxes_batch = det_data['all_boxes']
all_segms_batch = det_data['all_segms']
all_keyps_batch = det_data['all_keyps']
for cls_idx in range(1, cfg.MODEL.NUM_CLASSES):
all_boxes[cls_idx] += all_boxes_batch[cls_idx]
all_segms[cls_idx] += all_segms_batch[cls_idx]
all_keyps[cls_idx] += all_keyps_batch[cls_idx]
det_file = os.path.join(output_dir, 'detections.pkl')
cfg_yaml = yaml.dump(cfg)
save_object(
dict(
all_boxes=all_boxes,
all_segms=all_segms,
all_keyps=all_keyps,
cfg=cfg_yaml
), det_file
)
logger.info('Wrote detections to: {}'.format(os.path.abspath(det_file)))
return all_boxes, all_segms, all_keyps
def test_net(ind_range=None):
"""Run inference on all images in a dataset or over an index range of images
in a dataset using a single GPU.
"""
assert cfg.TEST.WEIGHTS != '', \
'TEST.WEIGHTS must be set to the model file to test'
assert not cfg.MODEL.RPN_ONLY, \
'Use rpn_generate to generate proposals from RPN-only models'
assert cfg.TEST.DATASET != '', \
'TEST.DATASET must be set to the dataset name to test'
output_dir = get_output_dir(training=False)
roidb, dataset, start_ind, end_ind, total_num_images = get_roidb_and_dataset(
ind_range
)
model = initialize_model_from_cfg()
num_images = len(roidb)
num_classes = cfg.MODEL.NUM_CLASSES
all_boxes, all_segms, all_keyps = empty_results(num_classes, num_images)
timers = defaultdict(Timer)
for i, entry in enumerate(roidb):
if cfg.MODEL.FASTER_RCNN:
# Faster R-CNN type models generate proposals on-the-fly with an
# in-network RPN
box_proposals = None
else:
# The roidb may contain ground-truth rois (for example, if the roidb
# comes from the training or val split). We only want to evaluate
# detection on the *non*-ground-truth rois. We select only the rois
# that have the gt_classes field set to 0, which means there's no
# ground truth.
box_proposals = entry['boxes'][entry['gt_classes'] == 0]
if len(box_proposals) == 0:
continue
im = cv2.imread(entry['image'])
with c2_utils.NamedCudaScope(0):
cls_boxes_i, cls_segms_i, cls_keyps_i = im_detect_all(
model, im, box_proposals, timers
)
extend_results(i, all_boxes, cls_boxes_i)
if cls_segms_i is not None:
extend_results(i, all_segms, cls_segms_i)
if cls_keyps_i is not None:
extend_results(i, all_keyps, cls_keyps_i)
if i % 10 == 0: # Reduce log file size
ave_total_time = np.sum([t.average_time for t in timers.values()])
eta_seconds = ave_total_time * (num_images - i - 1)
eta = str(datetime.timedelta(seconds=int(eta_seconds)))
det_time = (
timers['im_detect_bbox'].average_time +
timers['im_detect_mask'].average_time +
timers['im_detect_keypoints'].average_time
)
misc_time = (
timers['misc_bbox'].average_time +
timers['misc_mask'].average_time +
timers['misc_keypoints'].average_time
)
logger.info(
(
'im_detect: range [{:d}, {:d}] of {:d}: '
'{:d}/{:d} {:.3f}s + {:.3f}s (eta: {})'
).format(
start_ind + 1, end_ind, total_num_images, start_ind + i + 1,
start_ind + num_images, det_time, misc_time, eta
)
)
if cfg.VIS:
im_name = os.path.splitext(os.path.basename(entry['image']))[0]
vis_utils.vis_one_image(
im[:, :, ::-1],
'{:d}_{:s}'.format(i, im_name),
os.path.join(output_dir, 'vis'),
cls_boxes_i,
segms=cls_segms_i,
keypoints=cls_keyps_i,
thresh=cfg.VIS_TH,
box_alpha=0.8,
dataset=dataset,
show_class=True
)
cfg_yaml = yaml.dump(cfg)
if ind_range is not None:
det_name = 'detection_range_%s_%s.pkl' % tuple(ind_range)
else:
det_name = 'detections.pkl'
det_file = os.path.join(output_dir, det_name)
save_object(
dict(
all_boxes=all_boxes,
all_segms=all_segms,
all_keyps=all_keyps,
cfg=cfg_yaml
), det_file
)
logger.info('Wrote detections to: {}'.format(os.path.abspath(det_file)))
return all_boxes, all_segms, all_keyps
def initialize_model_from_cfg():
"""Initialize a model from the global cfg. Loads test-time weights and
creates the networks in the Caffe2 workspace.
"""
model = model_builder.create(cfg.MODEL.TYPE, train=False)
net_utils.initialize_from_weights_file(
model, cfg.TEST.WEIGHTS, broadcast=False
)
model_builder.add_inference_inputs(model)
workspace.CreateNet(model.net)
workspace.CreateNet(model.conv_body_net)
if cfg.MODEL.MASK_ON:
workspace.CreateNet(model.mask_net)
if cfg.MODEL.KEYPOINTS_ON:
workspace.CreateNet(model.keypoint_net)
return model
def get_roidb_and_dataset(ind_range):
"""Get the roidb for the dataset specified in the global cfg. Optionally
restrict it to a range of indices if ind_range is a pair of integers.
"""
dataset = JsonDataset(cfg.TEST.DATASET)
if cfg.MODEL.FASTER_RCNN:
roidb = dataset.get_roidb()
else:
roidb = dataset.get_roidb(
proposal_file=cfg.TEST.PROPOSAL_FILE,
proposal_limit=cfg.TEST.PROPOSAL_LIMIT
)
if ind_range is not None:
total_num_images = len(roidb)
start, end = ind_range
roidb = roidb[start:end]
else:
start = 0
end = len(roidb)
total_num_images = end
return roidb, dataset, start, end, total_num_images
def empty_results(num_classes, num_images):
"""Return empty results lists for boxes, masks, and keypoints.
Box detections are collected into:
all_boxes[cls][image] = N x 5 array with columns (x1, y1, x2, y2, score)
Instance mask predictions are collected into:
all_segms[cls][image] = [...] list of COCO RLE encoded masks that are in
1:1 correspondence with the boxes in all_boxes[cls][image]
Keypoint predictions are collected into:
all_keyps[cls][image] = [...] list of keypoints results, each encoded as
a 3D array (#rois, 4, #keypoints) with the 4 rows corresponding to
[x, y, logit, prob] (See: utils.keypoints.heatmaps_to_keypoints).
Keypoints are recorded for person (cls = 1); they are in 1:1
correspondence with the boxes in all_boxes[cls][image].
"""
# Note: do not be tempted to use [[] * N], which gives N references to the
# *same* empty list.
all_boxes = [[[] for _ in range(num_images)] for _ in range(num_classes)]
all_segms = [[[] for _ in range(num_images)] for _ in range(num_classes)]
all_keyps = [[[] for _ in range(num_images)] for _ in range(num_classes)]
return all_boxes, all_segms, all_keyps
def extend_results(index, all_res, im_res):
"""Add results for an image to the set of all results at the specified
index.
"""
# Skip cls_idx 0 (__background__)
for cls_idx in range(1, len(im_res)):
all_res[cls_idx][index] = im_res[cls_idx]
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Test a RetinaNet network on an image database"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
import cv2
import json
import os
import uuid
import yaml
import logging
from collections import defaultdict, OrderedDict
from caffe2.python import core, workspace
from core.config import cfg, get_output_dir
from core.rpn_generator import _get_image_blob
from datasets.json_dataset import JsonDataset
from datasets import task_evaluation
from modeling import model_builder
from modeling.generate_anchors import generate_anchors
from pycocotools.cocoeval import COCOeval
from utils.io import save_object
from utils.timer import Timer
import utils.boxes as box_utils
import utils.c2 as c2_utils
import utils.env as envu
import utils.net as nu
import utils.subprocess as subprocess_utils
logger = logging.getLogger(__name__)
def create_cell_anchors():
"""
Generate all types of anchors for all fpn levels/scales/aspect ratios.
This function is called only once at the beginning of inference.
"""
k_max, k_min = cfg.FPN.RPN_MAX_LEVEL, cfg.FPN.RPN_MIN_LEVEL
scales_per_octave = cfg.RETINANET.SCALES_PER_OCTAVE
aspect_ratios = cfg.RETINANET.ASPECT_RATIOS
anchor_scale = cfg.RETINANET.ANCHOR_SCALE
A = scales_per_octave * len(aspect_ratios)
anchors = {}
for lvl in range(k_min, k_max + 1):
# create cell anchors array
stride = 2. ** lvl
cell_anchors = np.zeros((A, 4))
a = 0
for octave in range(scales_per_octave):
octave_scale = 2 ** (octave / float(scales_per_octave))
for aspect in aspect_ratios:
anchor_sizes = (stride * octave_scale * anchor_scale, )
anchor_aspect_ratios = (aspect, )
cell_anchors[a, :] = generate_anchors(
stride=stride, sizes=anchor_sizes,
aspect_ratios=anchor_aspect_ratios)
a += 1
anchors[lvl] = cell_anchors
return anchors
def im_detections(model, im, anchors):
"""Generate RetinaNet detections on a single image."""
k_max, k_min = cfg.FPN.RPN_MAX_LEVEL, cfg.FPN.RPN_MIN_LEVEL
A = cfg.RETINANET.SCALES_PER_OCTAVE * len(cfg.RETINANET.ASPECT_RATIOS)
inputs = {}
inputs['data'], inputs['im_info'] = _get_image_blob(im)
cls_probs, box_preds = [], []
for lvl in range(k_min, k_max + 1):
suffix = 'fpn{}'.format(lvl)
cls_probs.append(core.ScopedName('retnet_cls_prob_{}'.format(suffix)))
box_preds.append(core.ScopedName('retnet_bbox_pred_{}'.format(suffix)))
for k, v in inputs.items():
workspace.FeedBlob(core.ScopedName(k), v.astype(np.float32, copy=False))
workspace.RunNet(model.net.Proto().name)
scale = inputs['im_info'][0, 2]
cls_probs = workspace.FetchBlobs(cls_probs)
box_preds = workspace.FetchBlobs(box_preds)
# here the boxes_all are [x0, y0, x1, y1, score]
boxes_all = defaultdict(list)
cnt = 0
for lvl in range(k_min, k_max + 1):
# create cell anchors array
stride = 2. ** lvl
cell_anchors = anchors[lvl]
# fetch per level probability
cls_prob = cls_probs[cnt]
box_pred = box_preds[cnt]
cls_prob = cls_prob.reshape((
cls_prob.shape[0], A, int(cls_prob.shape[1] / A),
cls_prob.shape[2], cls_prob.shape[3]))
box_pred = box_pred.reshape((
box_pred.shape[0], A, 4, box_pred.shape[2], box_pred.shape[3]))
cnt += 1
if cfg.RETINANET.SOFTMAX:
cls_prob = cls_prob[:, :, 1::, :, :]
cls_prob_ravel = cls_prob.ravel()
# In some cases [especially for very small img sizes], it's possible that
# candidate_ind is empty if we impose threshold 0.05 at all levels. This
# will lead to errors since no detections are found for this image. Hence,
# for lvl 7 which has small spatial resolution, we take the threshold 0.0
th = cfg.RETINANET.INFERENCE_TH if lvl < k_max else 0.0
candidate_inds = np.where(cls_prob_ravel > th)[0]
if (len(candidate_inds) == 0):
continue
pre_nms_topn = min(cfg.RETINANET.PRE_NMS_TOP_N, len(candidate_inds))
inds = np.argpartition(
cls_prob_ravel[candidate_inds], -pre_nms_topn)[-pre_nms_topn:]
inds = candidate_inds[inds]
inds_5d = np.array(np.unravel_index(inds, cls_prob.shape)).transpose()
classes = inds_5d[:, 2]
anchor_ids, y, x = inds_5d[:, 1], inds_5d[:, 3], inds_5d[:, 4]
scores = cls_prob[:, anchor_ids, classes, y, x]
boxes = np.column_stack((x, y, x, y)).astype(dtype=np.float32)
boxes *= stride
boxes += cell_anchors[anchor_ids, :]
if not cfg.RETINANET.CLASS_SPECIFIC_BBOX:
box_deltas = box_pred[0, anchor_ids, :, y, x]
else:
box_cls_inds = classes * 4
box_deltas = np.vstack(
[box_pred[0, ind:ind + 4, yi, xi]
for ind, yi, xi in zip(box_cls_inds, y, x)]
)
pred_boxes = (
box_utils.bbox_transform(boxes, box_deltas)
if cfg.TEST.BBOX_REG else boxes)
pred_boxes /= scale
pred_boxes = box_utils.clip_tiled_boxes(pred_boxes, im.shape)
box_scores = np.zeros((pred_boxes.shape[0], 5))
box_scores[:, 0:4] = pred_boxes
box_scores[:, 4] = scores
for cls in range(1, cfg.MODEL.NUM_CLASSES):
inds = np.where(classes == cls - 1)[0]
if len(inds) > 0:
boxes_all[cls].extend(box_scores[inds, :])
# Combine predictions across all levels and retain the top scoring by class
detections = []
for cls, boxes in boxes_all.items():
cls_dets = np.vstack(boxes).astype(dtype=np.float32)
# do class specific nms here
keep = box_utils.nms(cls_dets, cfg.TEST.NMS)
cls_dets = cls_dets[keep, :]
out = np.zeros((len(keep), 6))
out[:, 0:5] = cls_dets
out[:, 5].fill(cls)
detections.append(out)
detections = np.vstack(detections)
# sort all again
inds = np.argsort(-detections[:, 4])
detections = detections[inds[0:cfg.TEST.DETECTIONS_PER_IM], :]
boxes = detections[:, 0:4]
scores = detections[:, 4]
classes = detections[:, 5]
return boxes, scores, classes
def im_list_detections(model, im_list):
"""Generate RetinaNet proposals on all images in im_list."""
_t = Timer()
num_images = len(im_list)
im_list_boxes = [[] for _ in range(num_images)]
im_list_scores = [[] for _ in range(num_images)]
im_list_ids = [[] for _ in range(num_images)]
im_list_classes = [[] for _ in range(num_images)]
# create anchors for each level
anchors = create_cell_anchors()
for i in range(num_images):
im_list_ids[i] = im_list[i]['id']
im = cv2.imread(im_list[i]['image'])
with c2_utils.NamedCudaScope(0):
_t.tic()
im_list_boxes[i], im_list_scores[i], im_list_classes[i] = \
im_detections(model, im, anchors)
_t.toc()
logger.info(
'im_detections: {:d}/{:d} {:.3f}s'.format(
i + 1, num_images, _t.average_time))
return im_list_boxes, im_list_scores, im_list_classes, im_list_ids
def test_retinanet(ind_range=None):
"""
Test RetinaNet model either on the entire dataset or the subset of dataset
specified by the index range
"""
assert cfg.RETINANET.RETINANET_ON, \
'RETINANET_ON must be set for testing RetinaNet model'
output_dir = get_output_dir(training=False)
dataset = JsonDataset(cfg.TEST.DATASET)
im_list = dataset.get_roidb()
if ind_range is not None:
start, end = ind_range
im_list = im_list[start:end]
logger.info('Testing on roidb range: {}-{}'.format(start, end))
else:
# if testing over the whole dataset, use the NUM_TEST_IMAGES setting
# the NUM_TEST_IMAGES could be over a small set of images for quick
# debugging purposes
im_list = im_list[0:cfg.TEST.NUM_TEST_IMAGES]
model = model_builder.create(cfg.MODEL.TYPE, train=False)
if cfg.TEST.WEIGHTS:
nu.initialize_from_weights_file(
model, cfg.TEST.WEIGHTS, broadcast=False
)
model_builder.add_inference_inputs(model)
workspace.CreateNet(model.net)
boxes, scores, classes, image_ids = im_list_detections(
model, im_list[0:cfg.TEST.NUM_TEST_IMAGES])
cfg_yaml = yaml.dump(cfg)
if ind_range is not None:
det_name = 'retinanet_detections_range_%s_%s.pkl' % tuple(ind_range)
else:
det_name = 'retinanet_detections.pkl'
det_file = os.path.join(output_dir, det_name)
save_object(
dict(boxes=boxes, scores=scores, classes=classes, ids=image_ids, cfg=cfg_yaml),
det_file)
logger.info('Wrote detections to: {}'.format(os.path.abspath(det_file)))
return boxes, scores, classes, image_ids
def multi_gpu_test_retinanet_on_dataset(num_images, output_dir, dataset):
"""
If doing multi-gpu testing, we need to divide the data on various gpus and
make the subprocess call for each child process that'll run test_retinanet()
on its subset data. After all the subprocesses finish, we combine the results
and return
"""
# Retrieve the test_net binary path
binary_dir = envu.get_runtime_dir()
binary_ext = envu.get_py_bin_ext()
binary = os.path.join(binary_dir, 'test_net' + binary_ext)
assert os.path.exists(binary), 'Binary \'{}\' not found'.format(binary)
# Run inference in parallel in subprocesses
outputs = subprocess_utils.process_in_parallel(
'retinanet_detections', num_images, binary, output_dir)
# Combine the results from each subprocess now
boxes, scores, classes, image_ids = [], [], [], []
for det_data in outputs:
boxes.extend(det_data['boxes'])
scores.extend(det_data['scores'])
classes.extend(det_data['classes'])
image_ids.extend(det_data['ids'])
return boxes, scores, classes, image_ids,
def test_retinanet_on_dataset(multi_gpu=False):
"""
Main entry point for testing on a given dataset: whether multi_gpu or not
"""
output_dir = get_output_dir(training=False)
logger.info('Output will be saved to: {:s}'.format(os.path.abspath(output_dir)))
dataset = JsonDataset(cfg.TEST.DATASET)
# for test-dev or full test dataset, we generate detections for all images
if 'test-dev' in cfg.TEST.DATASET or 'test' in cfg.TEST.DATASET:
cfg.TEST.NUM_TEST_IMAGES = len(dataset.get_roidb())
if multi_gpu:
num_images = cfg.TEST.NUM_TEST_IMAGES
boxes, scores, classes, image_ids = multi_gpu_test_retinanet_on_dataset(
num_images, output_dir, dataset
)
else:
boxes, scores, classes, image_ids = test_retinanet()
# write RetinaNet detections pkl file to be used for various purposes
# dump the boxes first just in case there are spurious failures
res_file = os.path.join(output_dir, 'retinanet_detections.pkl')
logger.info(
'Writing roidb detections to file: {}'.
format(os.path.abspath(res_file))
)
save_object(
dict(boxes=boxes, scores=scores, classes=classes, ids=image_ids),
res_file
)
logger.info('Wrote RetinaNet detections to {}'.format(os.path.abspath(res_file)))
# Write the detections to a file that can be uploaded to coco evaluation server
# which takes a json file format
res_file = write_coco_detection_results(
output_dir, dataset, boxes, scores, classes, image_ids)
# Perform coco evaluation
coco_eval = coco_evaluate(dataset, res_file, image_ids)
box_results = task_evaluation._coco_eval_to_box_results(coco_eval)
return OrderedDict([(dataset.name, box_results)])
def coco_evaluate(json_dataset, res_file, image_ids):
coco_dt = json_dataset.COCO.loadRes(str(res_file))
coco_eval = COCOeval(json_dataset.COCO, coco_dt, 'bbox')
coco_eval.params.imgIds = image_ids
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()
return coco_eval
def write_coco_detection_results(
output_dir, json_dataset, all_boxes, all_scores, all_classes, image_ids,
use_salt=False
):
res_file = os.path.join(
output_dir, 'detections_' + json_dataset.name + '_results')
if use_salt:
res_file += '_{}'.format(str(uuid.uuid4()))
res_file += '.json'
logger.info('Writing RetinaNet detections for submitting to coco server...')
results = []
for (im_id, dets, cls, score) in zip(image_ids, all_boxes, all_classes, all_scores):
dets = dets.astype(np.float)
score = score.astype(np.float)
classes = np.array(
[json_dataset.contiguous_category_id_to_json_id[c] for c in cls])
xs = dets[:, 0]
ys = dets[:, 1]
ws = dets[:, 2] - xs + 1
hs = dets[:, 3] - ys + 1
results.extend(
[{'image_id': im_id,
'category_id': classes[k],
'bbox': [xs[k], ys[k], ws[k], hs[k]],
'score': score[k]} for k in range(dets.shape[0])])
logger.info('Writing detection results to json: {}'.format(
os.path.abspath(res_file)
))
with open(res_file, 'w') as fid:
json.dump(results, fid)
logger.info('Done!')
return res_file
function VOCopts = get_voc_opts(path)
tmp = pwd;
cd(path);
try
addpath('VOCcode');
VOCinit;
catch
rmpath('VOCcode');
cd(tmp);
error(sprintf('VOCcode directory not found under %s', path));
end
rmpath('VOCcode');
cd(tmp);
function res = voc_eval(path, comp_id, test_set, output_dir)
VOCopts = get_voc_opts(path);
VOCopts.testset = test_set;
for i = 1:length(VOCopts.classes)
cls = VOCopts.classes{i};
res(i) = voc_eval_cls(cls, VOCopts, comp_id, output_dir);
end
fprintf('\n~~~~~~~~~~~~~~~~~~~~\n');
fprintf('Results:\n');
aps = [res(:).ap]';
fprintf('%.1f\n', aps * 100);
fprintf('%.1f\n', mean(aps) * 100);
fprintf('~~~~~~~~~~~~~~~~~~~~\n');
function res = voc_eval_cls(cls, VOCopts, comp_id, output_dir)
test_set = VOCopts.testset;
year = VOCopts.dataset(4:end);
addpath(fullfile(VOCopts.datadir, 'VOCcode'));
res_fn = sprintf(VOCopts.detrespath, comp_id, cls);
recall = [];
prec = [];
ap = 0;
ap_auc = 0;
do_eval = (str2num(year) <= 2007) | ~strcmp(test_set, 'test');
if do_eval
% Bug in VOCevaldet requires that tic has been called first
tic;
[recall, prec, ap] = VOCevaldet(VOCopts, comp_id, cls, true);
ap_auc = xVOCap(recall, prec);
% force plot limits
ylim([0 1]);
xlim([0 1]);
print(gcf, '-djpeg', '-r0', ...
[output_dir '/' cls '_pr.jpg']);
end
fprintf('!!! %s : %.4f %.4f\n', cls, ap, ap_auc);
res.recall = recall;
res.prec = prec;
res.ap = ap;
res.ap_auc = ap_auc;
save([output_dir '/' cls '_pr.mat'], ...
'res', 'recall', 'prec', 'ap', 'ap_auc');
rmpath(fullfile(VOCopts.datadir, 'VOCcode'));
function ap = xVOCap(rec,prec)
% From the PASCAL VOC 2011 devkit
mrec=[0 ; rec ; 1];
mpre=[0 ; prec ; 0];
for i=numel(mpre)-1:-1:1
mpre(i)=max(mpre(i),mpre(i+1));
end
i=find(mrec(2:end)~=mrec(1:end-1))+1;
ap=sum((mrec(i)-mrec(i-1)).*mpre(i));
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
# mapping coco categories to cityscapes (our converted json) id
# cityscapes
# INFO roidb.py: 220: 1 bicycle: 7286
# INFO roidb.py: 220: 2 car: 53684
# INFO roidb.py: 220: 3 person: 35704
# INFO roidb.py: 220: 4 train: 336
# INFO roidb.py: 220: 5 truck: 964
# INFO roidb.py: 220: 6 motorcycle: 1468
# INFO roidb.py: 220: 7 bus: 758
# INFO roidb.py: 220: 8 rider: 3504
# coco (val5k)
# INFO roidb.py: 220: 1 person: 21296
# INFO roidb.py: 220: 2 bicycle: 628
# INFO roidb.py: 220: 3 car: 3818
# INFO roidb.py: 220: 4 motorcycle: 732
# INFO roidb.py: 220: 5 airplane: 286 <------ irrelevant
# INFO roidb.py: 220: 6 bus: 564
# INFO roidb.py: 220: 7 train: 380
# INFO roidb.py: 220: 8 truck: 828
def cityscapes_to_coco(cityscapes_id):
lookup = {
0: 0, # ... background
1: 2, # bicycle
2: 3, # car
3: 1, # person
4: 7, # train
5: 8, # truck
6: 4, # motorcycle
7: 6, # bus
8: -1, # rider (-1 means rand init)
}
return lookup[cityscapes_id]
def cityscapes_to_coco_with_rider(cityscapes_id):
lookup = {
0: 0, # ... background
1: 2, # bicycle
2: 3, # car
3: 1, # person
4: 7, # train
5: 8, # truck
6: 4, # motorcycle
7: 6, # bus
8: 1, # rider ("person", *rider has human right!*)
}
return lookup[cityscapes_id]
def cityscapes_to_coco_without_person_rider(cityscapes_id):
lookup = {
0: 0, # ... background
1: 2, # bicycle
2: 3, # car
3: -1, # person (ignore)
4: 7, # train
5: 8, # truck
6: 4, # motorcycle
7: 6, # bus
8: -1, # rider (ignore)
}
return lookup[cityscapes_id]
def cityscapes_to_coco_all_random(cityscapes_id):
lookup = {
0: -1, # ... background
1: -1, # bicycle
2: -1, # car
3: -1, # person (ignore)
4: -1, # train
5: -1, # truck
6: -1, # motorcycle
7: -1, # bus
8: -1, # rider (ignore)
}
return lookup[cityscapes_id]
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import argparse
import h5py
import json
import os
import scipy.misc
import sys
import cityscapesscripts.evaluation.instances2dict_with_polygons as cs
import utils.segms as segms_util
import utils.boxes as bboxs_util
def parse_args():
parser = argparse.ArgumentParser(description='Convert dataset')
parser.add_argument(
'--dataset', help="cocostuff, cityscapes", default=None, type=str)
parser.add_argument(
'--outdir', help="output dir for json files", default=None, type=str)
parser.add_argument(
'--datadir', help="data dir for annotations to be converted",
default=None, type=str)
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def convert_coco_stuff_mat(data_dir, out_dir):
"""Convert to png and save json with path. This currently only contains
the segmentation labels for objects+stuff in cocostuff - if we need to
combine with other labels from original COCO that will be a TODO."""
sets = ['train', 'val']
categories = []
json_name = 'coco_stuff_%s.json'
ann_dict = {}
for data_set in sets:
file_list = os.path.join(data_dir, '%s.txt')
images = []
with open(file_list % data_set) as f:
for img_id, img_name in enumerate(f):
img_name = img_name.replace('coco', 'COCO').strip('\n')
image = {}
mat_file = os.path.join(
data_dir, 'annotations/%s.mat' % img_name)
data = h5py.File(mat_file, 'r')
labelMap = data.get('S')
if len(categories) == 0:
labelNames = data.get('names')
for idx, n in enumerate(labelNames):
categories.append(
{"id": idx, "name": ''.join(chr(i) for i in data[
n[0]])})
ann_dict['categories'] = categories
scipy.misc.imsave(
os.path.join(data_dir, img_name + '.png'), labelMap)
image['width'] = labelMap.shape[0]
image['height'] = labelMap.shape[1]
image['file_name'] = img_name
image['seg_file_name'] = img_name
image['id'] = img_id
images.append(image)
ann_dict['images'] = images
print("Num images: %s" % len(images))
with open(os.path.join(out_dir, json_name % data_set), 'wb') as outfile:
outfile.write(json.dumps(ann_dict))
# for Cityscapes
def getLabelID(self, instID):
if (instID < 1000):
return instID
else:
return int(instID / 1000)
def convert_cityscapes_instance_only(
data_dir, out_dir):
"""Convert from cityscapes format to COCO instance seg format - polygons"""
sets = [
'gtFine_val',
# 'gtFine_train',
# 'gtFine_test',
# 'gtCoarse_train',
# 'gtCoarse_val',
# 'gtCoarse_train_extra'
]
ann_dirs = [
'gtFine_trainvaltest/gtFine/val',
# 'gtFine_trainvaltest/gtFine/train',
# 'gtFine_trainvaltest/gtFine/test',
# 'gtCoarse/train',
# 'gtCoarse/train_extra',
# 'gtCoarse/val'
]
json_name = 'instancesonly_filtered_%s.json'
ends_in = '%s_polygons.json'
img_id = 0
ann_id = 0
cat_id = 1
category_dict = {}
category_instancesonly = [
'person',
'rider',
'car',
'truck',
'bus',
'train',
'motorcycle',
'bicycle',
]
for data_set, ann_dir in zip(sets, ann_dirs):
print('Starting %s' % data_set)
ann_dict = {}
images = []
annotations = []
ann_dir = os.path.join(data_dir, ann_dir)
for root, _, files in os.walk(ann_dir):
for filename in files:
if filename.endswith(ends_in % data_set.split('_')[0]):
if len(images) % 50 == 0:
print("Processed %s images, %s annotations" % (
len(images), len(annotations)))
json_ann = json.load(open(os.path.join(root, filename)))
image = {}
image['id'] = img_id
img_id += 1
image['width'] = json_ann['imgWidth']
image['height'] = json_ann['imgHeight']
image['file_name'] = filename[:-len(
ends_in % data_set.split('_')[0])] + 'leftImg8bit.png'
image['seg_file_name'] = filename[:-len(
ends_in % data_set.split('_')[0])] + \
'%s_instanceIds.png' % data_set.split('_')[0]
images.append(image)
fullname = os.path.join(root, image['seg_file_name'])
objects = cs.instances2dict_with_polygons(
[fullname], verbose=False)[fullname]
for object_cls in objects:
if object_cls not in category_instancesonly:
continue # skip non-instance categories
for obj in objects[object_cls]:
if obj['contours'] == []:
print('Warning: empty contours.')
continue # skip non-instance categories
len_p = [len(p) for p in obj['contours']]
if min(len_p) <= 4:
print('Warning: invalid contours.')
continue # skip non-instance categories
ann = {}
ann['id'] = ann_id
ann_id += 1
ann['image_id'] = image['id']
ann['segmentation'] = obj['contours']
if object_cls not in category_dict:
category_dict[object_cls] = cat_id
cat_id += 1
ann['category_id'] = category_dict[object_cls]
ann['iscrowd'] = 0
ann['area'] = obj['pixelCount']
ann['bbox'] = bboxs_util.xyxy_to_xywh(
segms_util.polys_to_boxes(
[ann['segmentation']])).tolist()[0]
annotations.append(ann)
ann_dict['images'] = images
categories = [{"id": category_dict[name], "name": name} for name in
category_dict]
ann_dict['categories'] = categories
ann_dict['annotations'] = annotations
print("Num categories: %s" % len(categories))
print("Num images: %s" % len(images))
print("Num annotations: %s" % len(annotations))
with open(os.path.join(out_dir, json_name % data_set), 'wb') as outfile:
outfile.write(json.dumps(ann_dict))
if __name__ == '__main__':
args = parse_args()
if args.dataset == "cityscapes_instance_only":
convert_cityscapes_instance_only(args.datadir, args.outdir)
elif args.dataset == "cocostuff":
convert_coco_stuff_mat(args.datadir, args.outdir)
else:
print("Dataset not supported: %s" % args.dataset)
# Convert a detection model trained for COCO into a model that can be fine-tuned
# on cityscapes
#
# cityscapes_to_coco
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import cPickle as pickle
import argparse
import os
import sys
import numpy as np
import datasets.cityscapes.coco_to_cityscapes_id as cs
NUM_CS_CLS = 9
NUM_COCO_CLS = 81
def parse_args():
parser = argparse.ArgumentParser(
description='Convert a COCO pre-trained model for use with Cityscapes')
parser.add_argument(
'--coco_model', dest='coco_model_file_name',
help='Pretrained network weights file path',
default=None, type=str)
parser.add_argument(
'--convert_func', dest='convert_func',
help='Blob conversion function',
default='cityscapes_to_coco', type=str)
parser.add_argument(
'--output', dest='out_file_name',
help='Output file path',
default=None, type=str)
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
args = parser.parse_args()
return args
def convert_coco_blobs_to_cityscape_blobs(model_dict):
for k, v in model_dict['blobs'].items():
if v.shape[0] == NUM_COCO_CLS or v.shape[0] == 4 * NUM_COCO_CLS:
coco_blob = model_dict['blobs'][k]
print(
'Converting COCO blob {} with shape {}'.
format(k, coco_blob.shape)
)
cs_blob = convert_coco_blob_to_cityscapes_blob(
coco_blob, args.convert_func
)
print(' -> converted shape {}'.format(cs_blob.shape))
model_dict['blobs'][k] = cs_blob
def convert_coco_blob_to_cityscapes_blob(coco_blob, convert_func):
# coco blob (81, ...) or (81*4, ...)
coco_shape = coco_blob.shape
leading_factor = int(coco_shape[0] / NUM_COCO_CLS)
tail_shape = list(coco_shape[1:])
assert leading_factor == 1 or leading_factor == 4
# Reshape in [num_classes, ...] form for easier manipulations
coco_blob = coco_blob.reshape([NUM_COCO_CLS, -1] + tail_shape)
# Default initialization uses Gaussian with mean and std to match the
# existing parameters
std = coco_blob.std()
mean = coco_blob.mean()
cs_shape = [NUM_CS_CLS] + list(coco_blob.shape[1:])
cs_blob = (np.random.randn(*cs_shape) * std + mean).astype(np.float32)
# Replace random parameters with COCO parameters if class mapping exists
for i in range(NUM_CS_CLS):
coco_cls_id = getattr(cs, convert_func)(i)
if coco_cls_id >= 0: # otherwise ignore (rand init)
cs_blob[i] = coco_blob[coco_cls_id]
cs_shape = [NUM_CS_CLS * leading_factor] + tail_shape
return cs_blob.reshape(cs_shape)
def remove_momentum(model_dict):
for k in model_dict['blobs'].keys():
if k.endswith('_momentum'):
del model_dict['blobs'][k]
def load_and_convert_coco_model(args):
with open(args.coco_model_file_name, 'r') as f:
model_dict = pickle.load(f)
remove_momentum(model_dict)
convert_coco_blobs_to_cityscape_blobs(model_dict)
return model_dict
if __name__ == '__main__':
args = parse_args()
print(args)
assert os.path.exists(args.coco_model_file_name), \
'Weights file does not exist'
weights = load_and_convert_coco_model(args)
with open(args.out_file_name, 'w') as f:
pickle.dump(weights, f, protocol=pickle.HIGHEST_PROTOCOL)
print('Wrote blobs to {}:'.format(args.out_file_name))
print(sorted(weights['blobs'].keys()))
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Functions for evaluating results on Cityscapes."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import cv2
import logging
import os
import uuid
import pycocotools.mask as mask_util
from core.config import cfg
from datasets.dataset_catalog import DATASETS
from datasets.dataset_catalog import RAW_DIR
logger = logging.getLogger(__name__)
def evaluate_masks(
json_dataset,
all_boxes,
all_segms,
output_dir,
use_salt=True,
cleanup=False
):
if cfg.CLUSTER.ON_CLUSTER:
# On the cluster avoid saving these files in the job directory
output_dir = '/tmp'
res_file = os.path.join(
output_dir, 'segmentations_' + json_dataset.name + '_results')
if use_salt:
res_file += '_{}'.format(str(uuid.uuid4()))
res_file += '.json'
results_dir = os.path.join(output_dir, 'results')
if not os.path.exists(results_dir):
os.mkdir(results_dir)
os.environ['CITYSCAPES_DATASET'] = DATASETS[json_dataset.name][RAW_DIR]
os.environ['CITYSCAPES_RESULTS'] = output_dir
# Load the Cityscapes eval script *after* setting the required env vars,
# since the script reads their values into global variables (at load time).
import cityscapesscripts.evaluation.evalInstanceLevelSemanticLabeling \
as cityscapes_eval
roidb = json_dataset.get_roidb()
for i, entry in enumerate(roidb):
im_name = entry['image']
basename = os.path.splitext(os.path.basename(im_name))[0]
txtname = os.path.join(output_dir, basename + 'pred.txt')
with open(txtname, 'w') as fid_txt:
if i % 10 == 0:
logger.info('i: {}: {}'.format(i, basename))
for j in range(1, len(all_segms)):
clss = json_dataset.classes[j]
clss_id = cityscapes_eval.name2label[clss].id
segms = all_segms[j][i]
boxes = all_boxes[j][i]
if segms == []:
continue
masks = mask_util.decode(segms)
for k in range(boxes.shape[0]):
score = boxes[k, -1]
mask = masks[:, :, k]
pngname = os.path.join(
'results',
basename + '_' + clss + '_{}.png'.format(k))
# write txt
fid_txt.write('{} {} {}\n'.format(pngname, clss_id, score))
# save mask
cv2.imwrite(os.path.join(output_dir, pngname), mask * 255)
logger.info('Evaluating...')
cityscapes_eval.main([])
return None
# Setting Up Datasets
This directory contains symlinks to data locations.
## Creating Symlinks for COCO
Symlink the COCO dataset:
```
ln -s /path/to/coco $DETECTRON/lib/datasets/data/coco
```
We assume that your local COCO dataset copy at `/path/to/coco` has the following directory structure:
```
coco
|_ coco_train2014
| |_ <im-1-name>.jpg
| |_ ...
| |_ <im-N-name>.jpg
|_ coco_val2014
|_ ...
|_ annotations
|_ instances_train2014.json
|_ ...
```
If that is not the case, you may need to do something similar to:
```
mkdir -p $DETECTRON/lib/datasets/data/coco
ln -s /path/to/coco_train2014 $DETECTRON/lib/datasets/data/coco/
ln -s /path/to/coco_val2014 $DETECTRON/lib/datasets/data/coco/
ln -s /path/to/json/annotations $DETECTRON/lib/datasets/data/coco/annotations
```
### COCO Minival Annotations
Our custom `minival` and `valminusminival` annotations are available for download [here](https://s3-us-west-2.amazonaws.com/detectron/coco/coco_annotations_minival.tgz).
Please note that `minival` is exactly equivalent to the recently defined 2017 `val` set.
Similarly, the union of `valminusminival` and the 2014 `train` is exactly equivalent to the 2017 `train` set. To complete installation of the COCO dataset, you will need to copy the `minival` and `valminusminival` json annotation files to the `coco/annotations` directory referenced above.
## Creating Symlinks for PASCAL VOC
Symlink the PASCAL VOC dataset:
```
# VOC 2007
mkdir -p $DETECTRON/lib/datasets/data/VOC2007
ln -s /path/to/VOC2007/JPEG/images $DETECTRON/lib/datasets/data/VOC2007/JPEGImages
ln -s /path/to/VOC2007/json/annotations $DETECTRON/lib/datasets/annotations
ln -s /path/to/VOC2007/devkit $DETECTRON/lib/datasets/VOCdevkit2007
# VOC 2012
mkdir -p $DETECTRON/lib/datasets/data/VOC2012
ln -s /path/to/VOC2012/JPEG/images $DETECTRON/lib/datasets/data/VOC2012/JPEGImages
ln -s /path/to/VOC2012/json/annotations $DETECTRON/lib/datasets/annotations
ln -s /path/to/VOC2012/devkit $DETECTRON/lib/datasets/VOCdevkit2012
```
## Creating Symlinks for Cityscapes:
Symlink the Cityscapes dataset:
```
mkdir -p $DETECTRON/lib/datasets/data/cityscapes
ln -s /path/to/cityscapes/images $DETECTRON/lib/datasets/data/cityscapes/images
ln -s /path/to/cityscapes/json/annotations $DETECTRON/lib/datasets/data/cityscapes/annotations
ln -s /path/to/cityscapes/root/dir $DETECTRON/lib/datasets/data/cityscapes/raw
```
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Collection of available datasets."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os
# Path to data dir
_DATA_DIR = os.path.join(os.path.dirname(__file__), 'data')
# Required dataset entry keys
IM_DIR = 'image_directory'
ANN_FN = 'annotation_file'
# Optional dataset entry keys
IM_PREFIX = 'image_prefix'
DEVKIT_DIR = 'devkit_directory'
RAW_DIR = 'raw_dir'
# Available datasets
DATASETS = {
'cityscapes_fine_instanceonly_seg_train': {
IM_DIR:
_DATA_DIR + '/cityscapes/images',
ANN_FN:
_DATA_DIR + '/cityscapes/annotations/instancesonly_gtFine_train.json',
RAW_DIR:
_DATA_DIR + '/cityscapes/raw'
},
'cityscapes_fine_instanceonly_seg_val': {
IM_DIR:
_DATA_DIR + '/cityscapes/images',
# use filtered validation as there is an issue converting contours
ANN_FN:
_DATA_DIR + '/cityscapes/annotations/instancesonly_filtered_gtFine_val.json',
RAW_DIR:
_DATA_DIR + '/cityscapes/raw'
},
'cityscapes_fine_instanceonly_seg_test': {
IM_DIR:
_DATA_DIR + '/cityscapes/images',
ANN_FN:
_DATA_DIR + '/cityscapes/annotations/instancesonly_gtFine_test.json',
RAW_DIR:
_DATA_DIR + '/cityscapes/raw'
},
'coco_2014_train': {
IM_DIR:
_DATA_DIR + '/coco/coco_train2014',
ANN_FN:
_DATA_DIR + '/coco/annotations/instances_train2014.json'
},
'coco_2014_val': {
IM_DIR:
_DATA_DIR + '/coco/coco_val2014',
ANN_FN:
_DATA_DIR + '/coco/annotations/instances_val2014.json'
},
'coco_2014_minival': {
IM_DIR:
_DATA_DIR + '/coco/coco_val2014',
ANN_FN:
_DATA_DIR + '/coco/annotations/instances_minival2014.json'
},
'coco_2014_valminusminival': {
IM_DIR:
_DATA_DIR + '/coco/coco_val2014',
ANN_FN:
_DATA_DIR + '/coco/annotations/instances_valminusminival2014.json'
},
'coco_2015_test': {
IM_DIR:
_DATA_DIR + '/coco/coco_test2015',
ANN_FN:
_DATA_DIR + '/coco/annotations/image_info_test2015.json'
},
'coco_2015_test-dev': {
IM_DIR:
_DATA_DIR + '/coco/coco_test2015',
ANN_FN:
_DATA_DIR + '/coco/annotations/image_info_test-dev2015.json'
},
'coco_2017_test': { # 2017 test uses 2015 test images
IM_DIR:
_DATA_DIR + '/coco/coco_test2015',
ANN_FN:
_DATA_DIR + '/coco/annotations/image_info_test2017.json',
IM_PREFIX:
'COCO_test2015_'
},
'coco_2017_test-dev': { # 2017 test-dev uses 2015 test images
IM_DIR:
_DATA_DIR + '/coco/coco_test2015',
ANN_FN:
_DATA_DIR + '/coco/annotations/image_info_test-dev2017.json',
IM_PREFIX:
'COCO_test2015_'
},
'coco_stuff_train': {
IM_DIR:
_DATA_DIR + '/coco/coco_train2014',
ANN_FN:
_DATA_DIR + '/coco/annotations/coco_stuff_train.json'
},
'coco_stuff_val': {
IM_DIR:
_DATA_DIR + '/coco/coco_val2014',
ANN_FN:
_DATA_DIR + '/coco/annotations/coco_stuff_val.json'
},
'keypoints_coco_2014_train': {
IM_DIR:
_DATA_DIR + '/coco/coco_train2014',
ANN_FN:
_DATA_DIR + '/coco/annotations/person_keypoints_train2014.json'
},
'keypoints_coco_2014_val': {
IM_DIR:
_DATA_DIR + '/coco/coco_val2014',
ANN_FN:
_DATA_DIR + '/coco/annotations/person_keypoints_val2014.json'
},
'keypoints_coco_2014_minival': {
IM_DIR:
_DATA_DIR + '/coco/coco_val2014',
ANN_FN:
_DATA_DIR + '/coco/annotations/person_keypoints_minival2014.json'
},
'keypoints_coco_2014_valminusminival': {
IM_DIR:
_DATA_DIR + '/coco/coco_val2014',
ANN_FN:
_DATA_DIR + '/coco/annotations/person_keypoints_valminusminival2014.json'
},
'keypoints_coco_2015_test': {
IM_DIR:
_DATA_DIR + '/coco/coco_test2015',
ANN_FN:
_DATA_DIR + '/coco/annotations/image_info_test2015.json'
},
'keypoints_coco_2015_test-dev': {
IM_DIR:
_DATA_DIR + '/coco/coco_test2015',
ANN_FN:
_DATA_DIR + '/coco/annotations/image_info_test-dev2015.json'
},
'voc_2007_trainval': {
IM_DIR:
_DATA_DIR + '/VOC2007/JPEGImages',
ANN_FN:
_DATA_DIR + '/VOC2007/annotations/voc_2007_trainval.json',
DEVKIT_DIR:
_DATA_DIR + '/VOC2007/VOCdevkit2007'
},
'voc_2007_test': {
IM_DIR:
_DATA_DIR + '/VOC2007/JPEGImages',
ANN_FN:
_DATA_DIR + '/VOC2007/annotations/voc_2007_test.json',
DEVKIT_DIR:
_DATA_DIR + '/VOC2007/VOCdevkit2007'
},
'voc_2012_trainval': {
IM_DIR:
_DATA_DIR + '/VOC2012/JPEGImages',
ANN_FN:
_DATA_DIR + '/VOC2012/annotations/voc_2012_trainval.json',
DEVKIT_DIR:
_DATA_DIR + '/VOC2012/VOCdevkit2012'
}
}
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Provide stub objects that can act as stand-in "dummy" datasets for simple use
cases, like getting all classes in a dataset. This exists so that demos can be
run without requiring users to download/install datasets first.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from utils.collections import AttrDict
def get_coco_dataset():
"""A dummy COCO dataset that includes only the 'classes' field."""
ds = AttrDict()
classes = [
'__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse',
'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack',
'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis',
'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove',
'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass',
'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich',
'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake',
'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv',
'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave',
'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase',
'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]
ds.classes = {i: name for i, name in enumerate(classes)}
return ds
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Representation of the standard COCO json dataset format.
When working with a new dataset, we strongly suggest to convert the dataset into
the COCO json format and use the existing code; it is not recommended to write
code to support new dataset formats.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import copy
import cPickle as pickle
import logging
import numpy as np
import os
import scipy.sparse
# Must happen before importing COCO API (which imports matplotlib)
import utils.env as envu
envu.set_up_matplotlib()
# COCO API
from pycocotools import mask as COCOmask
from pycocotools.coco import COCO
from core.config import cfg
from datasets.dataset_catalog import ANN_FN
from datasets.dataset_catalog import DATASETS
from datasets.dataset_catalog import IM_DIR
from datasets.dataset_catalog import IM_PREFIX
from utils.timer import Timer
import utils.boxes as box_utils
logger = logging.getLogger(__name__)
class JsonDataset(object):
"""A class representing a COCO json dataset."""
def __init__(self, name):
assert name in DATASETS.keys(), \
'Unknown dataset name: {}'.format(name)
assert os.path.exists(DATASETS[name][IM_DIR]), \
'Image directory \'{}\' not found'.format(DATASETS[name][IM_DIR])
assert os.path.exists(DATASETS[name][ANN_FN]), \
'Annotation file \'{}\' not found'.format(DATASETS[name][ANN_FN])
logger.debug('Creating: {}'.format(name))
self.name = name
self.image_directory = DATASETS[name][IM_DIR]
self.image_prefix = (
'' if IM_PREFIX not in DATASETS[name] else DATASETS[name][IM_PREFIX]
)
self.COCO = COCO(DATASETS[name][ANN_FN])
self.debug_timer = Timer()
# Set up dataset classes
category_ids = self.COCO.getCatIds()
categories = [c['name'] for c in self.COCO.loadCats(category_ids)]
self.category_to_id_map = dict(zip(categories, category_ids))
self.classes = ['__background__'] + categories
self.num_classes = len(self.classes)
self.json_category_id_to_contiguous_id = {
v: i + 1
for i, v in enumerate(self.COCO.getCatIds())
}
self.contiguous_category_id_to_json_id = {
v: k
for k, v in self.json_category_id_to_contiguous_id.items()
}
self._init_keypoints()
def get_roidb(
self,
gt=False,
proposal_file=None,
min_proposal_size=2,
proposal_limit=-1,
crowd_filter_thresh=0
):
"""Return an roidb corresponding to the json dataset. Optionally:
- include ground truth boxes in the roidb
- add proposals specified in a proposals file
- filter proposals based on a minimum side length
- filter proposals that intersect with crowd regions
"""
assert gt is True or crowd_filter_thresh == 0, \
'Crowd filter threshold must be 0 if ground-truth annotations ' \
'are not included.'
image_ids = self.COCO.getImgIds()
image_ids.sort()
roidb = copy.deepcopy(self.COCO.loadImgs(image_ids))
for entry in roidb:
self._prep_roidb_entry(entry)
if gt:
# Include ground-truth object annotations
self.debug_timer.tic()
for entry in roidb:
self._add_gt_annotations(entry)
logger.debug(
'_add_gt_annotations took {:.3f}s'.
format(self.debug_timer.toc(average=False))
)
if proposal_file is not None:
# Include proposals from a file
self.debug_timer.tic()
self._add_proposals_from_file(
roidb, proposal_file, min_proposal_size, proposal_limit,
crowd_filter_thresh
)
logger.debug(
'_add_proposals_from_file took {:.3f}s'.
format(self.debug_timer.toc(average=False))
)
_add_class_assignments(roidb)
return roidb
def _prep_roidb_entry(self, entry):
"""Adds empty metadata fields to an roidb entry."""
# Reference back to the parent dataset
entry['dataset'] = self
# Make file_name an abs path
entry['image'] = os.path.join(
self.image_directory, self.image_prefix + entry['file_name']
)
entry['flipped'] = False
entry['has_visible_keypoints'] = False
# Empty placeholders
entry['boxes'] = np.empty((0, 4), dtype=np.float32)
entry['segms'] = []
entry['gt_classes'] = np.empty((0), dtype=np.int32)
entry['seg_areas'] = np.empty((0), dtype=np.float32)
entry['gt_overlaps'] = scipy.sparse.csr_matrix(
np.empty((0, self.num_classes), dtype=np.float32)
)
entry['is_crowd'] = np.empty((0), dtype=np.bool)
# 'box_to_gt_ind_map': Shape is (#rois). Maps from each roi to the index
# in the list of rois that satisfy np.where(entry['gt_classes'] > 0)
entry['box_to_gt_ind_map'] = np.empty((0), dtype=np.int32)
if self.keypoints is not None:
entry['gt_keypoints'] = np.empty(
(0, 3, self.num_keypoints), dtype=np.int32
)
# Remove unwanted fields that come from the json file (if they exist)
for k in ['date_captured', 'url', 'license', 'file_name']:
if k in entry:
del entry[k]
def _add_gt_annotations(self, entry):
"""Add ground truth annotation metadata to an roidb entry."""
ann_ids = self.COCO.getAnnIds(imgIds=entry['id'], iscrowd=None)
objs = self.COCO.loadAnns(ann_ids)
# Sanitize bboxes -- some are invalid
valid_objs = []
valid_segms = []
width = entry['width']
height = entry['height']
for obj in objs:
# crowd regions are RLE encoded and stored as dicts
if isinstance(obj['segmentation'], list):
# Valid polygons have >= 3 points, so require >= 6 coordinates
obj['segmentation'] = [
p for p in obj['segmentation'] if len(p) >= 6
]
if obj['area'] < cfg.TRAIN.GT_MIN_AREA:
continue
if 'ignore' in obj and obj['ignore'] == 1:
continue
# Convert form (x1, y1, w, h) to (x1, y1, x2, y2)
x1, y1, x2, y2 = box_utils.xywh_to_xyxy(obj['bbox'])
x1, y1, x2, y2 = box_utils.clip_xyxy_to_image(
x1, y1, x2, y2, height, width
)
# Require non-zero seg area and more than 1x1 box size
if obj['area'] > 0 and x2 > x1 and y2 > y1:
obj['clean_bbox'] = [x1, y1, x2, y2]
valid_objs.append(obj)
valid_segms.append(obj['segmentation'])
num_valid_objs = len(valid_objs)
boxes = np.zeros((num_valid_objs, 4), dtype=entry['boxes'].dtype)
gt_classes = np.zeros((num_valid_objs), dtype=entry['gt_classes'].dtype)
gt_overlaps = np.zeros(
(num_valid_objs, self.num_classes),
dtype=entry['gt_overlaps'].dtype
)
seg_areas = np.zeros((num_valid_objs), dtype=entry['seg_areas'].dtype)
is_crowd = np.zeros((num_valid_objs), dtype=entry['is_crowd'].dtype)
box_to_gt_ind_map = np.zeros(
(num_valid_objs), dtype=entry['box_to_gt_ind_map'].dtype
)
if self.keypoints is not None:
gt_keypoints = np.zeros(
(num_valid_objs, 3, self.num_keypoints),
dtype=entry['gt_keypoints'].dtype
)
im_has_visible_keypoints = False
for ix, obj in enumerate(valid_objs):
cls = self.json_category_id_to_contiguous_id[obj['category_id']]
boxes[ix, :] = obj['clean_bbox']
gt_classes[ix] = cls
seg_areas[ix] = obj['area']
is_crowd[ix] = obj['iscrowd']
box_to_gt_ind_map[ix] = ix
if self.keypoints is not None:
gt_keypoints[ix, :, :] = self._get_gt_keypoints(obj)
if np.sum(gt_keypoints[ix, 2, :]) > 0:
im_has_visible_keypoints = True
if obj['iscrowd']:
# Set overlap to -1 for all classes for crowd objects
# so they will be excluded during training
gt_overlaps[ix, :] = -1.0
else:
gt_overlaps[ix, cls] = 1.0
entry['boxes'] = np.append(entry['boxes'], boxes, axis=0)
entry['segms'].extend(valid_segms)
# To match the original implementation:
# entry['boxes'] = np.append(
# entry['boxes'], boxes.astype(np.int).astype(np.float), axis=0)
entry['gt_classes'] = np.append(entry['gt_classes'], gt_classes)
entry['seg_areas'] = np.append(entry['seg_areas'], seg_areas)
entry['gt_overlaps'] = np.append(
entry['gt_overlaps'].toarray(), gt_overlaps, axis=0
)
entry['gt_overlaps'] = scipy.sparse.csr_matrix(entry['gt_overlaps'])
entry['is_crowd'] = np.append(entry['is_crowd'], is_crowd)
entry['box_to_gt_ind_map'] = np.append(
entry['box_to_gt_ind_map'], box_to_gt_ind_map
)
if self.keypoints is not None:
entry['gt_keypoints'] = np.append(
entry['gt_keypoints'], gt_keypoints, axis=0
)
entry['has_visible_keypoints'] = im_has_visible_keypoints
def _add_proposals_from_file(
self, roidb, proposal_file, min_proposal_size, top_k, crowd_thresh
):
"""Add proposals from a proposals file to an roidb."""
logger.info('Loading proposals from: {}'.format(proposal_file))
with open(proposal_file, 'r') as f:
proposals = pickle.load(f)
id_field = 'indexes' if 'indexes' in proposals else 'ids' # compat fix
_sort_proposals(proposals, id_field)
box_list = []
for i, entry in enumerate(roidb):
if i % 2500 == 0:
logger.info(' {:d}/{:d}'.format(i + 1, len(roidb)))
boxes = proposals['boxes'][i]
# Sanity check that these boxes are for the correct image id
assert entry['id'] == proposals[id_field][i]
# Remove duplicate boxes and very small boxes and then take top k
boxes = box_utils.clip_boxes_to_image(
boxes, entry['height'], entry['width']
)
keep = box_utils.unique_boxes(boxes)
boxes = boxes[keep, :]
keep = box_utils.filter_small_boxes(boxes, min_proposal_size)
boxes = boxes[keep, :]
if top_k > 0:
boxes = boxes[:top_k, :]
box_list.append(boxes)
_merge_proposal_boxes_into_roidb(roidb, box_list)
if crowd_thresh > 0:
_filter_crowd_proposals(roidb, crowd_thresh)
def _init_keypoints(self):
"""Initialize COCO keypoint information."""
self.keypoints = None
self.keypoint_flip_map = None
self.keypoints_to_id_map = None
self.num_keypoints = 0
# Thus far only the 'person' category has keypoints
if 'person' in self.category_to_id_map:
cat_info = self.COCO.loadCats([self.category_to_id_map['person']])
else:
return
# Check if the annotations contain keypoint data or not
if 'keypoints' in cat_info[0]:
keypoints = cat_info[0]['keypoints']
self.keypoints_to_id_map = dict(
zip(keypoints, range(len(keypoints))))
self.keypoints = keypoints
self.num_keypoints = len(keypoints)
self.keypoint_flip_map = {
'left_eye': 'right_eye',
'left_ear': 'right_ear',
'left_shoulder': 'right_shoulder',
'left_elbow': 'right_elbow',
'left_wrist': 'right_wrist',
'left_hip': 'right_hip',
'left_knee': 'right_knee',
'left_ankle': 'right_ankle'}
def _get_gt_keypoints(self, obj):
"""Return ground truth keypoints."""
if 'keypoints' not in obj:
return None
kp = np.array(obj['keypoints'])
x = kp[0::3] # 0-indexed x coordinates
y = kp[1::3] # 0-indexed y coordinates
# 0: not labeled; 1: labeled, not inside mask;
# 2: labeled and inside mask
v = kp[2::3]
num_keypoints = len(obj['keypoints']) / 3
assert num_keypoints == self.num_keypoints
gt_kps = np.ones((3, self.num_keypoints), dtype=np.int32)
for i in range(self.num_keypoints):
gt_kps[0, i] = x[i]
gt_kps[1, i] = y[i]
gt_kps[2, i] = v[i]
return gt_kps
def add_proposals(roidb, rois, scales, crowd_thresh):
"""Add proposal boxes (rois) to an roidb that has ground-truth annotations
but no proposals. If the proposals are not at the original image scale,
specify the scale factor that separate them in scales.
"""
box_list = []
for i in range(len(roidb)):
inv_im_scale = 1. / scales[i]
idx = np.where(rois[:, 0] == i)[0]
box_list.append(rois[idx, 1:] * inv_im_scale)
_merge_proposal_boxes_into_roidb(roidb, box_list)
if crowd_thresh > 0:
_filter_crowd_proposals(roidb, crowd_thresh)
_add_class_assignments(roidb)
def _merge_proposal_boxes_into_roidb(roidb, box_list):
"""Add proposal boxes to each roidb entry."""
assert len(box_list) == len(roidb)
for i, entry in enumerate(roidb):
boxes = box_list[i]
num_boxes = boxes.shape[0]
gt_overlaps = np.zeros(
(num_boxes, entry['gt_overlaps'].shape[1]),
dtype=entry['gt_overlaps'].dtype
)
box_to_gt_ind_map = -np.ones(
(num_boxes), dtype=entry['box_to_gt_ind_map'].dtype
)
# Note: unlike in other places, here we intentionally include all gt
# rois, even ones marked as crowd. Boxes that overlap with crowds will
# be filtered out later (see: _filter_crowd_proposals).
gt_inds = np.where(entry['gt_classes'] > 0)[0]
if len(gt_inds) > 0:
gt_boxes = entry['boxes'][gt_inds, :]
gt_classes = entry['gt_classes'][gt_inds]
proposal_to_gt_overlaps = box_utils.bbox_overlaps(
boxes.astype(dtype=np.float32, copy=False),
gt_boxes.astype(dtype=np.float32, copy=False)
)
# Gt box that overlaps each input box the most
# (ties are broken arbitrarily by class order)
argmaxes = proposal_to_gt_overlaps.argmax(axis=1)
# Amount of that overlap
maxes = proposal_to_gt_overlaps.max(axis=1)
# Those boxes with non-zero overlap with gt boxes
I = np.where(maxes > 0)[0]
# Record max overlaps with the class of the appropriate gt box
gt_overlaps[I, gt_classes[argmaxes[I]]] = maxes[I]
box_to_gt_ind_map[I] = gt_inds[argmaxes[I]]
entry['boxes'] = np.append(
entry['boxes'],
boxes.astype(entry['boxes'].dtype, copy=False),
axis=0
)
entry['gt_classes'] = np.append(
entry['gt_classes'],
np.zeros((num_boxes), dtype=entry['gt_classes'].dtype)
)
entry['seg_areas'] = np.append(
entry['seg_areas'],
np.zeros((num_boxes), dtype=entry['seg_areas'].dtype)
)
entry['gt_overlaps'] = np.append(
entry['gt_overlaps'].toarray(), gt_overlaps, axis=0
)
entry['gt_overlaps'] = scipy.sparse.csr_matrix(entry['gt_overlaps'])
entry['is_crowd'] = np.append(
entry['is_crowd'],
np.zeros((num_boxes), dtype=entry['is_crowd'].dtype)
)
entry['box_to_gt_ind_map'] = np.append(
entry['box_to_gt_ind_map'],
box_to_gt_ind_map.astype(
entry['box_to_gt_ind_map'].dtype, copy=False
)
)
def _filter_crowd_proposals(roidb, crowd_thresh):
"""Finds proposals that are inside crowd regions and marks them as
overlap = -1 with each ground-truth rois, which means they will be excluded
from training.
"""
for entry in roidb:
gt_overlaps = entry['gt_overlaps'].toarray()
crowd_inds = np.where(entry['is_crowd'] == 1)[0]
non_gt_inds = np.where(entry['gt_classes'] == 0)[0]
if len(crowd_inds) == 0 or len(non_gt_inds) == 0:
continue
crowd_boxes = box_utils.xyxy_to_xywh(entry['boxes'][crowd_inds, :])
non_gt_boxes = box_utils.xyxy_to_xywh(entry['boxes'][non_gt_inds, :])
iscrowd_flags = [int(True)] * len(crowd_inds)
ious = COCOmask.iou(non_gt_boxes, crowd_boxes, iscrowd_flags)
bad_inds = np.where(ious.max(axis=1) > crowd_thresh)[0]
gt_overlaps[non_gt_inds[bad_inds], :] = -1
entry['gt_overlaps'] = scipy.sparse.csr_matrix(gt_overlaps)
def _add_class_assignments(roidb):
"""Compute object category assignment for each box associated with each
roidb entry.
"""
for entry in roidb:
gt_overlaps = entry['gt_overlaps'].toarray()
# max overlap with gt over classes (columns)
max_overlaps = gt_overlaps.max(axis=1)
# gt class that had the max overlap
max_classes = gt_overlaps.argmax(axis=1)
entry['max_classes'] = max_classes
entry['max_overlaps'] = max_overlaps
# sanity checks
# if max overlap is 0, the class must be background (class 0)
zero_inds = np.where(max_overlaps == 0)[0]
assert all(max_classes[zero_inds] == 0)
# if max overlap > 0, the class must be a fg class (not class 0)
nonzero_inds = np.where(max_overlaps > 0)[0]
assert all(max_classes[nonzero_inds] != 0)
def _sort_proposals(proposals, id_field):
"""Sort proposals by the specified id field."""
order = np.argsort(proposals[id_field])
fields_to_sort = ['boxes', id_field, 'scores']
for k in fields_to_sort:
proposals[k] = [proposals[k][i] for i in order]
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Functions for evaluating results computed for a json dataset."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import json
import logging
import numpy as np
import os
import uuid
from pycocotools.cocoeval import COCOeval
from core.config import cfg
from utils.io import save_object
import utils.boxes as box_utils
logger = logging.getLogger(__name__)
def evaluate_masks(
json_dataset,
all_boxes,
all_segms,
output_dir,
use_salt=True,
cleanup=False
):
res_file = os.path.join(
output_dir, 'segmentations_' + json_dataset.name + '_results'
)
if use_salt:
res_file += '_{}'.format(str(uuid.uuid4()))
res_file += '.json'
_write_coco_segms_results_file(
json_dataset, all_boxes, all_segms, res_file)
# Only do evaluation on non-test sets (annotations are undisclosed on test)
if json_dataset.name.find('test') == -1:
coco_eval = _do_segmentation_eval(json_dataset, res_file, output_dir)
else:
coco_eval = None
# Optionally cleanup results json file
if cleanup:
os.remove(res_file)
return coco_eval
def _write_coco_segms_results_file(
json_dataset, all_boxes, all_segms, res_file
):
# [{"image_id": 42,
# "category_id": 18,
# "segmentation": [...],
# "score": 0.236}, ...]
results = []
for cls_ind, cls in enumerate(json_dataset.classes):
if cls == '__background__':
continue
if cls_ind >= len(all_boxes):
break
cat_id = json_dataset.category_to_id_map[cls]
results.extend(_coco_segms_results_one_category(
json_dataset, all_boxes[cls_ind], all_segms[cls_ind], cat_id))
logger.info(
'Writing segmentation results json to: {}'.format(
os.path.abspath(res_file)))
with open(res_file, 'w') as fid:
json.dump(results, fid)
def _coco_segms_results_one_category(json_dataset, boxes, segms, cat_id):
results = []
image_ids = json_dataset.COCO.getImgIds()
image_ids.sort()
assert len(boxes) == len(image_ids)
assert len(segms) == len(image_ids)
for i, image_id in enumerate(image_ids):
dets = boxes[i]
rles = segms[i]
if isinstance(dets, list) and len(dets) == 0:
continue
dets = dets.astype(np.float)
scores = dets[:, -1]
results.extend(
[{'image_id': image_id,
'category_id': cat_id,
'segmentation': rles[k],
'score': scores[k]}
for k in range(dets.shape[0])])
return results
def _do_segmentation_eval(json_dataset, res_file, output_dir):
coco_dt = json_dataset.COCO.loadRes(str(res_file))
coco_eval = COCOeval(json_dataset.COCO, coco_dt, 'segm')
coco_eval.evaluate()
coco_eval.accumulate()
_log_detection_eval_metrics(json_dataset, coco_eval)
eval_file = os.path.join(output_dir, 'segmentation_results.pkl')
save_object(coco_eval, eval_file)
logger.info('Wrote json eval results to: {}'.format(eval_file))
return coco_eval
def evaluate_boxes(
json_dataset, all_boxes, output_dir, use_salt=True, cleanup=False
):
res_file = os.path.join(
output_dir, 'bbox_' + json_dataset.name + '_results'
)
if use_salt:
res_file += '_{}'.format(str(uuid.uuid4()))
res_file += '.json'
_write_coco_bbox_results_file(json_dataset, all_boxes, res_file)
# Only do evaluation on non-test sets (annotations are undisclosed on test)
if json_dataset.name.find('test') == -1:
coco_eval = _do_detection_eval(json_dataset, res_file, output_dir)
else:
coco_eval = None
# Optionally cleanup results json file
if cleanup:
os.remove(res_file)
return coco_eval
def _write_coco_bbox_results_file(json_dataset, all_boxes, res_file):
# [{"image_id": 42,
# "category_id": 18,
# "bbox": [258.15,41.29,348.26,243.78],
# "score": 0.236}, ...]
results = []
for cls_ind, cls in enumerate(json_dataset.classes):
if cls == '__background__':
continue
if cls_ind >= len(all_boxes):
break
cat_id = json_dataset.category_to_id_map[cls]
results.extend(_coco_bbox_results_one_category(
json_dataset, all_boxes[cls_ind], cat_id))
logger.info(
'Writing bbox results json to: {}'.format(os.path.abspath(res_file)))
with open(res_file, 'w') as fid:
json.dump(results, fid)
def _coco_bbox_results_one_category(json_dataset, boxes, cat_id):
results = []
image_ids = json_dataset.COCO.getImgIds()
image_ids.sort()
assert len(boxes) == len(image_ids)
for i, image_id in enumerate(image_ids):
dets = boxes[i]
if isinstance(dets, list) and len(dets) == 0:
continue
dets = dets.astype(np.float)
scores = dets[:, -1]
xywh_dets = box_utils.xyxy_to_xywh(dets[:, 0:4])
xs = xywh_dets[:, 0]
ys = xywh_dets[:, 1]
ws = xywh_dets[:, 2]
hs = xywh_dets[:, 3]
results.extend(
[{'image_id': image_id,
'category_id': cat_id,
'bbox': [xs[k], ys[k], ws[k], hs[k]],
'score': scores[k]} for k in range(dets.shape[0])])
return results
def _do_detection_eval(json_dataset, res_file, output_dir):
coco_dt = json_dataset.COCO.loadRes(str(res_file))
coco_eval = COCOeval(json_dataset.COCO, coco_dt, 'bbox')
coco_eval.evaluate()
coco_eval.accumulate()
_log_detection_eval_metrics(json_dataset, coco_eval)
eval_file = os.path.join(output_dir, 'detection_results.pkl')
save_object(coco_eval, eval_file)
logger.info('Wrote json eval results to: {}'.format(eval_file))
return coco_eval
def _log_detection_eval_metrics(json_dataset, coco_eval):
def _get_thr_ind(coco_eval, thr):
ind = np.where((coco_eval.params.iouThrs > thr - 1e-5) &
(coco_eval.params.iouThrs < thr + 1e-5))[0][0]
iou_thr = coco_eval.params.iouThrs[ind]
assert np.isclose(iou_thr, thr)
return ind
IoU_lo_thresh = 0.5
IoU_hi_thresh = 0.95
ind_lo = _get_thr_ind(coco_eval, IoU_lo_thresh)
ind_hi = _get_thr_ind(coco_eval, IoU_hi_thresh)
# precision has dims (iou, recall, cls, area range, max dets)
# area range index 0: all area ranges
# max dets index 2: 100 per image
precision = coco_eval.eval['precision'][ind_lo:(ind_hi + 1), :, :, 0, 2]
ap_default = np.mean(precision[precision > -1])
logger.info(
'~~~~ Mean and per-category AP @ IoU=[{:.2f},{:.2f}] ~~~~'.format(
IoU_lo_thresh, IoU_hi_thresh))
logger.info('{:.1f}'.format(100 * ap_default))
for cls_ind, cls in enumerate(json_dataset.classes):
if cls == '__background__':
continue
# minus 1 because of __background__
precision = coco_eval.eval['precision'][
ind_lo:(ind_hi + 1), :, cls_ind - 1, 0, 2]
ap = np.mean(precision[precision > -1])
logger.info('{:.1f}'.format(100 * ap))
logger.info('~~~~ Summary metrics ~~~~')
coco_eval.summarize()
def evaluate_box_proposals(
json_dataset, roidb, thresholds=None, area='all', limit=None
):
"""Evaluate detection proposal recall metrics. This function is a much
faster alternative to the official COCO API recall evaluation code. However,
it produces slightly different results.
"""
# Record max overlap value for each gt box
# Return vector of overlap values
areas = {
'all': 0,
'small': 1,
'medium': 2,
'large': 3,
'96-128': 4,
'128-256': 5,
'256-512': 6,
'512-inf': 7}
area_ranges = [
[0**2, 1e5**2], # all
[0**2, 32**2], # small
[32**2, 96**2], # medium
[96**2, 1e5**2], # large
[96**2, 128**2], # 96-128
[128**2, 256**2], # 128-256
[256**2, 512**2], # 256-512
[512**2, 1e5**2]] # 512-inf
assert area in areas, 'Unknown area range: {}'.format(area)
area_range = area_ranges[areas[area]]
gt_overlaps = np.zeros(0)
num_pos = 0
for entry in roidb:
gt_inds = np.where(
(entry['gt_classes'] > 0) & (entry['is_crowd'] == 0))[0]
gt_boxes = entry['boxes'][gt_inds, :]
gt_areas = entry['seg_areas'][gt_inds]
valid_gt_inds = np.where(
(gt_areas >= area_range[0]) & (gt_areas <= area_range[1]))[0]
gt_boxes = gt_boxes[valid_gt_inds, :]
num_pos += len(valid_gt_inds)
non_gt_inds = np.where(entry['gt_classes'] == 0)[0]
boxes = entry['boxes'][non_gt_inds, :]
if boxes.shape[0] == 0:
continue
if limit is not None and boxes.shape[0] > limit:
boxes = boxes[:limit, :]
overlaps = box_utils.bbox_overlaps(
boxes.astype(dtype=np.float32, copy=False),
gt_boxes.astype(dtype=np.float32, copy=False))
_gt_overlaps = np.zeros((gt_boxes.shape[0]))
for j in range(min(boxes.shape[0], gt_boxes.shape[0])):
# find which proposal box maximally covers each gt box
argmax_overlaps = overlaps.argmax(axis=0)
# and get the iou amount of coverage for each gt box
max_overlaps = overlaps.max(axis=0)
# find which gt box is 'best' covered (i.e. 'best' = most iou)
gt_ind = max_overlaps.argmax()
gt_ovr = max_overlaps.max()
assert gt_ovr >= 0
# find the proposal box that covers the best covered gt box
box_ind = argmax_overlaps[gt_ind]
# record the iou coverage of this gt box
_gt_overlaps[j] = overlaps[box_ind, gt_ind]
assert _gt_overlaps[j] == gt_ovr
# mark the proposal box and the gt box as used
overlaps[box_ind, :] = -1
overlaps[:, gt_ind] = -1
# append recorded iou coverage level
gt_overlaps = np.hstack((gt_overlaps, _gt_overlaps))
gt_overlaps = np.sort(gt_overlaps)
if thresholds is None:
step = 0.05
thresholds = np.arange(0.5, 0.95 + 1e-5, step)
recalls = np.zeros_like(thresholds)
# compute recall for each iou threshold
for i, t in enumerate(thresholds):
recalls[i] = (gt_overlaps >= t).sum() / float(num_pos)
# ar = 2 * np.trapz(recalls, thresholds)
ar = recalls.mean()
return {'ar': ar, 'recalls': recalls, 'thresholds': thresholds,
'gt_overlaps': gt_overlaps, 'num_pos': num_pos}
def evaluate_keypoints(
json_dataset,
all_boxes,
all_keypoints,
output_dir,
use_salt=True,
cleanup=False
):
res_file = os.path.join(
output_dir, 'keypoints_' + json_dataset.name + '_results'
)
if use_salt:
res_file += '_{}'.format(str(uuid.uuid4()))
res_file += '.json'
_write_coco_keypoint_results_file(
json_dataset, all_boxes, all_keypoints, res_file)
# Only do evaluation on non-test sets (annotations are undisclosed on test)
if json_dataset.name.find('test') == -1:
coco_eval = _do_keypoint_eval(json_dataset, res_file, output_dir)
else:
coco_eval = None
# Optionally cleanup results json file
if cleanup:
os.remove(res_file)
return coco_eval
def _write_coco_keypoint_results_file(
json_dataset, all_boxes, all_keypoints, res_file
):
results = []
for cls_ind, cls in enumerate(json_dataset.classes):
if cls == '__background__':
continue
if cls_ind >= len(all_keypoints):
break
logger.info(
'Collecting {} results ({:d}/{:d})'.format(
cls, cls_ind, len(all_keypoints) - 1))
cat_id = json_dataset.category_to_id_map[cls]
results.extend(_coco_kp_results_one_category(
json_dataset, all_boxes[cls_ind], all_keypoints[cls_ind], cat_id))
logger.info(
'Writing keypoint results json to: {}'.format(
os.path.abspath(res_file)))
with open(res_file, 'w') as fid:
json.dump(results, fid)
def _coco_kp_results_one_category(json_dataset, boxes, kps, cat_id):
results = []
image_ids = json_dataset.COCO.getImgIds()
image_ids.sort()
assert len(kps) == len(image_ids)
assert len(boxes) == len(image_ids)
use_box_score = False
if cfg.KRCNN.KEYPOINT_CONFIDENCE == 'logit':
# This is ugly; see utils.keypoints.heatmap_to_keypoints for the magic
# indexes
score_index = 2
elif cfg.KRCNN.KEYPOINT_CONFIDENCE == 'prob':
score_index = 3
elif cfg.KRCNN.KEYPOINT_CONFIDENCE == 'bbox':
use_box_score = True
else:
raise ValueError(
'KRCNN.KEYPOINT_CONFIDENCE must be "logit", "prob", or "bbox"')
for i, image_id in enumerate(image_ids):
if len(boxes[i]) == 0:
continue
kps_dets = kps[i]
scores = boxes[i][:, -1].astype(np.float)
if len(kps_dets) == 0:
continue
for j in range(len(kps_dets)):
xy = []
kps_score = 0
for k in range(kps_dets[j].shape[1]):
xy.append(float(kps_dets[j][0, k]))
xy.append(float(kps_dets[j][1, k]))
xy.append(1)
if not use_box_score:
kps_score += kps_dets[j][score_index, k]
if use_box_score:
kps_score = scores[j]
else:
kps_score /= kps_dets[j].shape[1]
results.extend([{'image_id': image_id,
'category_id': cat_id,
'keypoints': xy,
'score': kps_score}])
return results
def _do_keypoint_eval(json_dataset, res_file, output_dir):
ann_type = 'keypoints'
imgIds = json_dataset.COCO.getImgIds()
imgIds.sort()
coco_dt = json_dataset.COCO.loadRes(res_file)
coco_eval = COCOeval(json_dataset.COCO, coco_dt, ann_type)
coco_eval.params.imgIds = imgIds
coco_eval.evaluate()
coco_eval.accumulate()
eval_file = os.path.join(output_dir, 'keypoint_results.pkl')
save_object(coco_eval, eval_file)
logger.info('Wrote json eval results to: {}'.format(eval_file))
coco_eval.summarize()
return coco_eval
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Functions for common roidb manipulations."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from past.builtins import basestring
import logging
import numpy as np
from core.config import cfg
from datasets.json_dataset import JsonDataset
import utils.boxes as box_utils
import utils.keypoints as keypoint_utils
import utils.segms as segm_utils
logger = logging.getLogger(__name__)
def combined_roidb_for_training(dataset_names, proposal_files):
"""Load and concatenate roidbs for one or more datasets, along with optional
object proposals. The roidb entries are then prepared for use in training,
which involves caching certain types of metadata for each roidb entry.
"""
def get_roidb(dataset_name, proposal_file):
ds = JsonDataset(dataset_name)
roidb = ds.get_roidb(
gt=True,
proposal_file=proposal_file,
crowd_filter_thresh=cfg.TRAIN.CROWD_FILTER_THRESH
)
if cfg.TRAIN.USE_FLIPPED:
logger.info('Appending horizontally-flipped training examples...')
extend_with_flipped_entries(roidb, ds)
logger.info('Loaded dataset: {:s}'.format(ds.name))
return roidb
if isinstance(dataset_names, basestring):
dataset_names = (dataset_names, )
if isinstance(proposal_files, basestring):
proposal_files = (proposal_files, )
if len(proposal_files) == 0:
proposal_files = (None, ) * len(dataset_names)
assert len(dataset_names) == len(proposal_files)
roidbs = [get_roidb(*args) for args in zip(dataset_names, proposal_files)]
roidb = roidbs[0]
for r in roidbs[1:]:
roidb.extend(r)
roidb = filter_for_training(roidb)
logger.info('Computing bounding-box regression targets...')
add_bbox_regression_targets(roidb)
logger.info('done')
_compute_and_log_stats(roidb)
return roidb
def extend_with_flipped_entries(roidb, dataset):
"""Flip each entry in the given roidb and return a new roidb that is the
concatenation of the original roidb and the flipped entries.
"Flipping" an entry means that that image and associated metadata (e.g.,
ground truth boxes and object proposals) are horizontally flipped.
"""
flipped_roidb = []
for entry in roidb:
width = entry['width']
boxes = entry['boxes'].copy()
oldx1 = boxes[:, 0].copy()
oldx2 = boxes[:, 2].copy()
boxes[:, 0] = width - oldx2 - 1
boxes[:, 2] = width - oldx1 - 1
assert (boxes[:, 2] >= boxes[:, 0]).all()
flipped_entry = {}
dont_copy = ('boxes', 'segms', 'gt_keypoints', 'flipped')
for k, v in entry.items():
if k not in dont_copy:
flipped_entry[k] = v
flipped_entry['boxes'] = boxes
flipped_entry['segms'] = segm_utils.flip_segms(
entry['segms'], entry['height'], entry['width']
)
if dataset.keypoints is not None:
flipped_entry['gt_keypoints'] = keypoint_utils.flip_keypoints(
dataset.keypoints, dataset.keypoint_flip_map,
entry['gt_keypoints'], entry['width']
)
flipped_entry['flipped'] = True
flipped_roidb.append(flipped_entry)
roidb.extend(flipped_roidb)
def filter_for_training(roidb):
"""Remove roidb entries that have no usable RoIs based on config settings.
"""
def is_valid(entry):
# Valid images have:
# (1) At least one foreground RoI OR
# (2) At least one background RoI
overlaps = entry['max_overlaps']
# find boxes with sufficient overlap
fg_inds = np.where(overlaps >= cfg.TRAIN.FG_THRESH)[0]
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
bg_inds = np.where((overlaps < cfg.TRAIN.BG_THRESH_HI) &
(overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]
# image is only valid if such boxes exist
valid = len(fg_inds) > 0 or len(bg_inds) > 0
if cfg.MODEL.KEYPOINTS_ON:
# If we're training for keypoints, exclude images with no keypoints
valid = valid and entry['has_visible_keypoints']
return valid
num = len(roidb)
filtered_roidb = [entry for entry in roidb if is_valid(entry)]
num_after = len(filtered_roidb)
logger.info('Filtered {} roidb entries: {} -> {}'.
format(num - num_after, num, num_after))
return filtered_roidb
def add_bbox_regression_targets(roidb):
"""Add information needed to train bounding-box regressors."""
for entry in roidb:
entry['bbox_targets'] = _compute_targets(entry)
def _compute_targets(entry):
"""Compute bounding-box regression targets for an image."""
# Indices of ground-truth ROIs
rois = entry['boxes']
overlaps = entry['max_overlaps']
labels = entry['max_classes']
gt_inds = np.where((entry['gt_classes'] > 0) & (entry['is_crowd'] == 0))[0]
# Targets has format (class, tx, ty, tw, th)
targets = np.zeros((rois.shape[0], 5), dtype=np.float32)
if len(gt_inds) == 0:
# Bail if the image has no ground-truth ROIs
return targets
# Indices of examples for which we try to make predictions
ex_inds = np.where(overlaps >= cfg.TRAIN.BBOX_THRESH)[0]
# Get IoU overlap between each ex ROI and gt ROI
ex_gt_overlaps = box_utils.bbox_overlaps(
rois[ex_inds, :].astype(dtype=np.float32, copy=False),
rois[gt_inds, :].astype(dtype=np.float32, copy=False))
# Find which gt ROI each ex ROI has max overlap with:
# this will be the ex ROI's gt target
gt_assignment = ex_gt_overlaps.argmax(axis=1)
gt_rois = rois[gt_inds[gt_assignment], :]
ex_rois = rois[ex_inds, :]
# Use class "1" for all boxes if using class_agnostic_bbox_reg
targets[ex_inds, 0] = (
1 if cfg.MODEL.CLS_AGNOSTIC_BBOX_REG else labels[ex_inds])
targets[ex_inds, 1:] = box_utils.bbox_transform_inv(
ex_rois, gt_rois, cfg.MODEL.BBOX_REG_WEIGHTS)
return targets
def _compute_and_log_stats(roidb):
classes = roidb[0]['dataset'].classes
char_len = np.max([len(c) for c in classes])
hist_bins = np.arange(len(classes) + 1)
# Histogram of ground-truth objects
gt_hist = np.zeros((len(classes)), dtype=np.int)
for entry in roidb:
gt_inds = np.where(
(entry['gt_classes'] > 0) & (entry['is_crowd'] == 0))[0]
gt_classes = entry['gt_classes'][gt_inds]
gt_hist += np.histogram(gt_classes, bins=hist_bins)[0]
logger.info('Ground-truth class histogram:')
for i, v in enumerate(gt_hist):
logger.info(
'{:d}{:s}: {:d}'.format(
i, classes[i].rjust(char_len), v))
logger.info('-' * char_len)
logger.info(
'{:s}: {:d}'.format(
'total'.rjust(char_len), np.sum(gt_hist)))
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Evaluation interface for supported tasks (box detection, instance
segmentation, keypoint detection, ...).
Results are stored in an OrderedDict with the following nested structure:
<dataset>:
<task>:
<metric>: <val>
<dataset> is any valid dataset (e.g., 'coco_2014_minival')
<task> is in ['box', 'mask', 'keypoint', 'box_proposal']
<metric> can be ['AP', 'AP50', 'AP75', 'APs', 'APm', 'APl', 'AR@1000',
'ARs@1000', 'ARm@1000', 'ARl@1000', ...]
<val> is a floating point number
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from collections import OrderedDict
import logging
import os
import pprint
from core.config import cfg
from utils.logging import send_email
import datasets.cityscapes_json_dataset_evaluator as cs_json_dataset_evaluator
import datasets.json_dataset_evaluator as json_dataset_evaluator
import datasets.voc_dataset_evaluator as voc_dataset_evaluator
logger = logging.getLogger(__name__)
def evaluate_all(
dataset, all_boxes, all_segms, all_keyps, output_dir, use_matlab=False
):
"""Evaluate "all" tasks, where "all" includes box detection, instance
segmentation, and keypoint detection.
"""
all_results = evaluate_boxes(
dataset, all_boxes, output_dir, use_matlab=use_matlab
)
logger.info('Evaluating bounding boxes is done!')
if cfg.MODEL.MASK_ON:
results = evaluate_masks(dataset, all_boxes, all_segms, output_dir)
all_results[dataset.name].update(results[dataset.name])
logger.info('Evaluating segmentations is done!')
if cfg.MODEL.KEYPOINTS_ON:
results = evaluate_keypoints(dataset, all_boxes, all_keyps, output_dir)
all_results[dataset.name].update(results[dataset.name])
logger.info('Evaluating keypoints is done!')
return all_results
def evaluate_boxes(dataset, all_boxes, output_dir, use_matlab=False):
"""Evaluate bounding box detection."""
logger.info('Evaluating detections')
not_comp = not cfg.TEST.COMPETITION_MODE
if _use_json_dataset_evaluator(dataset):
coco_eval = json_dataset_evaluator.evaluate_boxes(
dataset, all_boxes, output_dir, use_salt=not_comp, cleanup=not_comp
)
box_results = _coco_eval_to_box_results(coco_eval)
elif _use_cityscapes_evaluator(dataset):
logger.warn('Cityscapes bbox evaluated using COCO metrics/conversions')
coco_eval = json_dataset_evaluator.evaluate_boxes(
dataset, all_boxes, output_dir, use_salt=not_comp, cleanup=not_comp
)
box_results = _coco_eval_to_box_results(coco_eval)
elif _use_voc_evaluator(dataset):
# For VOC, always use salt and always cleanup because results are
# written to the shared VOCdevkit results directory
voc_eval = voc_dataset_evaluator.evaluate_boxes(
dataset, all_boxes, output_dir, use_matlab=use_matlab
)
box_results = _voc_eval_to_box_results(voc_eval)
else:
raise NotImplementedError(
'No evaluator for dataset: {}'.format(dataset.name)
)
return OrderedDict([(dataset.name, box_results)])
def evaluate_masks(dataset, all_boxes, all_segms, output_dir):
"""Evaluate instance segmentation."""
logger.info('Evaluating segmentations')
not_comp = not cfg.TEST.COMPETITION_MODE
if _use_json_dataset_evaluator(dataset):
coco_eval = json_dataset_evaluator.evaluate_masks(
dataset,
all_boxes,
all_segms,
output_dir,
use_salt=not_comp,
cleanup=not_comp
)
mask_results = _coco_eval_to_mask_results(coco_eval)
elif _use_cityscapes_evaluator(dataset):
cs_eval = cs_json_dataset_evaluator.evaluate_masks(
dataset,
all_boxes,
all_segms,
output_dir,
use_salt=not_comp,
cleanup=not_comp
)
mask_results = _cs_eval_to_mask_results(cs_eval)
else:
raise NotImplementedError(
'No evaluator for dataset: {}'.format(dataset.name)
)
return OrderedDict([(dataset.name, mask_results)])
def evaluate_keypoints(dataset, all_boxes, all_keyps, output_dir):
"""Evaluate human keypoint detection (i.e., 2D pose estimation)."""
logger.info('Evaluating detections')
not_comp = not cfg.TEST.COMPETITION_MODE
assert dataset.name.startswith('keypoints_coco_'), \
'Only COCO keypoints are currently supported'
coco_eval = json_dataset_evaluator.evaluate_keypoints(
dataset,
all_boxes,
all_keyps,
output_dir,
use_salt=not_comp,
cleanup=not_comp
)
keypoint_results = _coco_eval_to_keypoint_results(coco_eval)
return OrderedDict([(dataset.name, keypoint_results)])
def evaluate_box_proposals(dataset, roidb):
"""Evaluate bounding box object proposals."""
res = _empty_box_proposal_results()
areas = {'all': '', 'small': 's', 'medium': 'm', 'large': 'l'}
for limit in [100, 1000]:
for area, suffix in areas.items():
stats = json_dataset_evaluator.evaluate_box_proposals(
dataset, roidb, area=area, limit=limit
)
key = 'AR{}@{:d}'.format(suffix, limit)
res['box_proposal'][key] = stats['ar']
return OrderedDict([(dataset.name, res)])
def log_box_proposal_results(results):
"""Log bounding box proposal results."""
for dataset in results.keys():
keys = results[dataset]['box_proposal'].keys()
pad = max([len(k) for k in keys])
logger.info(dataset)
for k, v in results[dataset]['box_proposal'].items():
logger.info('{}: {:.3f}'.format(k.ljust(pad), v))
def log_copy_paste_friendly_results(results):
"""Log results in a format that makes it easy to copy-and-paste in a
spreadsheet. Lines are prefixed with 'copypaste: ' to make grepping easy.
"""
for dataset in results.keys():
logger.info('copypaste: Dataset: {}'.format(dataset))
for task, metrics in results[dataset].items():
logger.info('copypaste: Task: {}'.format(task))
metric_names = metrics.keys()
metric_vals = ['{:.4f}'.format(v) for v in metrics.values()]
logger.info('copypaste: ' + ','.join(metric_names))
logger.info('copypaste: ' + ','.join(metric_vals))
def check_expected_results(results, atol=0.005, rtol=0.1):
"""Check actual results against expected results stored in
cfg.EXPECTED_RESULTS. Optionally email if the match exceeds the specified
tolerance.
Expected results should take the form of a list of expectations, each
specified by four elements: [dataset, task, metric, expected value]. For
example: [['coco_2014_minival', 'box_proposal', 'AR@1000', 0.387], ...].
"""
# cfg contains a reference set of results that we want to check against
if len(cfg.EXPECTED_RESULTS) == 0:
return
for dataset, task, metric, expected_val in cfg.EXPECTED_RESULTS:
assert dataset in results, 'Dataset {} not in results'.format(dataset)
assert task in results[dataset], 'Task {} not in results'.format(task)
assert metric in results[dataset][task], \
'Metric {} not in results'.format(metric)
actual_val = results[dataset][task][metric]
err = abs(actual_val - expected_val)
tol = atol + rtol * abs(expected_val)
msg = (
'{} > {} > {} sanity check (actual vs. expected): '
'{:.3f} vs. {:.3f}, err={:.3f}, tol={:.3f}'
).format(dataset, task, metric, actual_val, expected_val, err, tol)
if err > tol:
msg = 'FAIL: ' + msg
logger.error(msg)
if cfg.EXPECTED_RESULTS_EMAIL != '':
subject = 'Detectron end-to-end test failure'
job_name = os.environ[
'DETECTRON_JOB_NAME'
] if 'DETECTRON_JOB_NAME' in os.environ else '<unknown>'
job_id = os.environ[
'WORKFLOW_RUN_ID'
] if 'WORKFLOW_RUN_ID' in os.environ else '<unknown>'
body = [
'Name:',
job_name,
'Run ID:',
job_id,
'Failure:',
msg,
'Config:',
pprint.pformat(cfg),
'Env:',
pprint.pformat(dict(os.environ)),
]
send_email(
subject, '\n\n'.join(body), cfg.EXPECTED_RESULTS_EMAIL
)
else:
msg = 'PASS: ' + msg
logger.info(msg)
def _use_json_dataset_evaluator(dataset):
"""Check if the dataset uses the general json dataset evaluator."""
return dataset.name.find('coco_') > -1 or cfg.TEST.FORCE_JSON_DATASET_EVAL
def _use_cityscapes_evaluator(dataset):
"""Check if the dataset uses the Cityscapes dataset evaluator."""
return dataset.name.find('cityscapes_') > -1
def _use_voc_evaluator(dataset):
"""Check if the dataset uses the PASCAL VOC dataset evaluator."""
return dataset.name[:4] == 'voc_'
# Indices in the stats array for COCO boxes and masks
COCO_AP = 0
COCO_AP50 = 1
COCO_AP75 = 2
COCO_APS = 3
COCO_APM = 4
COCO_APL = 5
# Slight difference for keypoints
COCO_KPS_APM = 3
COCO_KPS_APL = 4
# ---------------------------------------------------------------------------- #
# Helper functions for producing properly formatted results.
# ---------------------------------------------------------------------------- #
def _coco_eval_to_box_results(coco_eval):
res = _empty_box_results()
if coco_eval is not None:
s = coco_eval.stats
res['box']['AP'] = s[COCO_AP]
res['box']['AP50'] = s[COCO_AP50]
res['box']['AP75'] = s[COCO_AP75]
res['box']['APs'] = s[COCO_APS]
res['box']['APm'] = s[COCO_APM]
res['box']['APl'] = s[COCO_APL]
return res
def _coco_eval_to_mask_results(coco_eval):
res = _empty_mask_results()
if coco_eval is not None:
s = coco_eval.stats
res['mask']['AP'] = s[COCO_AP]
res['mask']['AP50'] = s[COCO_AP50]
res['mask']['AP75'] = s[COCO_AP75]
res['mask']['APs'] = s[COCO_APS]
res['mask']['APm'] = s[COCO_APM]
res['mask']['APl'] = s[COCO_APL]
return res
def _coco_eval_to_keypoint_results(coco_eval):
res = _empty_keypoint_results()
if coco_eval is not None:
s = coco_eval.stats
res['keypoint']['AP'] = s[COCO_AP]
res['keypoint']['AP50'] = s[COCO_AP50]
res['keypoint']['AP75'] = s[COCO_AP75]
res['keypoint']['APm'] = s[COCO_KPS_APM]
res['keypoint']['APl'] = s[COCO_KPS_APL]
return res
def _voc_eval_to_box_results(voc_eval):
# Not supported (return empty results)
return _empty_box_results()
def _cs_eval_to_mask_results(cs_eval):
# Not supported (return empty results)
return _empty_mask_results()
def _empty_box_results():
return OrderedDict({
'box':
OrderedDict(
[
('AP', -1),
('AP50', -1),
('AP75', -1),
('APs', -1),
('APm', -1),
('APl', -1),
]
)
})
def _empty_mask_results():
return OrderedDict({
'mask':
OrderedDict(
[
('AP', -1),
('AP50', -1),
('AP75', -1),
('APs', -1),
('APm', -1),
('APl', -1),
]
)
})
def _empty_keypoint_results():
return OrderedDict({
'keypoint':
OrderedDict(
[
('AP', -1),
('AP50', -1),
('AP75', -1),
('APm', -1),
('APl', -1),
]
)
})
def _empty_box_proposal_results():
return OrderedDict({
'box_proposal':
OrderedDict(
[
('AR@100', -1),
('ARs@100', -1),
('ARm@100', -1),
('ARl@100', -1),
('AR@1000', -1),
('ARs@1000', -1),
('ARm@1000', -1),
('ARl@1000', -1),
]
)
})
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""PASCAL VOC dataset evaluation interface."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import logging
import numpy as np
import os
import shutil
import uuid
from core.config import cfg
from datasets.dataset_catalog import DATASETS
from datasets.dataset_catalog import DEVKIT_DIR
from datasets.voc_eval import voc_eval
from utils.io import save_object
logger = logging.getLogger(__name__)
def evaluate_boxes(
json_dataset,
all_boxes,
output_dir,
use_salt=True,
cleanup=True,
use_matlab=False
):
salt = '_{}'.format(str(uuid.uuid4())) if use_salt else ''
filenames = _write_voc_results_files(json_dataset, all_boxes, salt)
_do_python_eval(json_dataset, salt, output_dir)
if use_matlab:
_do_matlab_eval(json_dataset, salt, output_dir)
if cleanup:
for filename in filenames:
shutil.copy(filename, output_dir)
os.remove(filename)
return None
def _write_voc_results_files(json_dataset, all_boxes, salt):
filenames = []
image_set_path = voc_info(json_dataset)['image_set_path']
assert os.path.exists(image_set_path), \
'Image set path does not exist: {}'.format(image_set_path)
with open(image_set_path, 'r') as f:
image_index = [x.strip() for x in f.readlines()]
# Sanity check that order of images in json dataset matches order in the
# image set
roidb = json_dataset.get_roidb()
for i, entry in enumerate(roidb):
index = os.path.splitext(os.path.split(entry['image'])[1])[0]
assert index == image_index[i]
for cls_ind, cls in enumerate(json_dataset.classes):
if cls == '__background__':
continue
logger.info('Writing VOC results for: {}'.format(cls))
filename = _get_voc_results_file_template(json_dataset,
salt).format(cls)
filenames.append(filename)
assert len(all_boxes[cls_ind]) == len(image_index)
with open(filename, 'wt') as f:
for im_ind, index in enumerate(image_index):
dets = all_boxes[cls_ind][im_ind]
if type(dets) == list:
assert len(dets) == 0, \
'dets should be numpy.ndarray or empty list'
continue
# the VOCdevkit expects 1-based indices
for k in range(dets.shape[0]):
f.write('{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}\n'.
format(index, dets[k, -1],
dets[k, 0] + 1, dets[k, 1] + 1,
dets[k, 2] + 1, dets[k, 3] + 1))
return filenames
def _get_voc_results_file_template(json_dataset, salt):
info = voc_info(json_dataset)
year = info['year']
image_set = info['image_set']
devkit_path = info['devkit_path']
# VOCdevkit/results/VOC2007/Main/<comp_id>_det_test_aeroplane.txt
filename = 'comp4' + salt + '_det_' + image_set + '_{:s}.txt'
return os.path.join(devkit_path, 'results', 'VOC' + year, 'Main', filename)
def _do_python_eval(json_dataset, salt, output_dir='output'):
info = voc_info(json_dataset)
year = info['year']
anno_path = info['anno_path']
image_set_path = info['image_set_path']
devkit_path = info['devkit_path']
cachedir = os.path.join(devkit_path, 'annotations_cache')
aps = []
# The PASCAL VOC metric changed in 2010
use_07_metric = True if int(year) < 2010 else False
logger.info('VOC07 metric? ' + ('Yes' if use_07_metric else 'No'))
if not os.path.isdir(output_dir):
os.mkdir(output_dir)
for _, cls in enumerate(json_dataset.classes):
if cls == '__background__':
continue
filename = _get_voc_results_file_template(
json_dataset, salt).format(cls)
rec, prec, ap = voc_eval(
filename, anno_path, image_set_path, cls, cachedir, ovthresh=0.5,
use_07_metric=use_07_metric)
aps += [ap]
logger.info('AP for {} = {:.4f}'.format(cls, ap))
res_file = os.path.join(output_dir, cls + '_pr.pkl')
save_object({'rec': rec, 'prec': prec, 'ap': ap}, res_file)
logger.info('Mean AP = {:.4f}'.format(np.mean(aps)))
logger.info('~~~~~~~~')
logger.info('Results:')
for ap in aps:
logger.info('{:.3f}'.format(ap))
logger.info('{:.3f}'.format(np.mean(aps)))
logger.info('~~~~~~~~')
logger.info('')
logger.info('----------------------------------------------------------')
logger.info('Results computed with the **unofficial** Python eval code.')
logger.info('Results should be very close to the official MATLAB code.')
logger.info('Use `./tools/reval.py --matlab ...` for your paper.')
logger.info('-- Thanks, The Management')
logger.info('----------------------------------------------------------')
def _do_matlab_eval(json_dataset, salt, output_dir='output'):
import subprocess
logger.info('-----------------------------------------------------')
logger.info('Computing results with the official MATLAB eval code.')
logger.info('-----------------------------------------------------')
info = voc_info(json_dataset)
path = os.path.join(
cfg.ROOT_DIR, 'lib', 'datasets', 'VOCdevkit-matlab-wrapper')
cmd = 'cd {} && '.format(path)
cmd += '{:s} -nodisplay -nodesktop '.format(cfg.MATLAB)
cmd += '-r "dbstop if error; '
cmd += 'voc_eval(\'{:s}\',\'{:s}\',\'{:s}\',\'{:s}\'); quit;"' \
.format(info['devkit_path'], 'comp4' + salt, info['image_set'],
output_dir)
logger.info('Running:\n{}'.format(cmd))
subprocess.call(cmd, shell=True)
def voc_info(json_dataset):
year = json_dataset.name[4:8]
image_set = json_dataset.name[9:]
devkit_path = DATASETS[json_dataset.name][DEVKIT_DIR]
assert os.path.exists(devkit_path), \
'Devkit directory {} not found'.format(devkit_path)
anno_path = os.path.join(
devkit_path, 'VOC' + year, 'Annotations', '{:s}.xml')
image_set_path = os.path.join(
devkit_path, 'VOC' + year, 'ImageSets', 'Main', image_set + '.txt')
return dict(
year=year,
image_set=image_set,
devkit_path=devkit_path,
anno_path=anno_path,
image_set_path=image_set_path)
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
#
# Based on:
# --------------------------------------------------------
# Fast/er R-CNN
# Licensed under The MIT License [see LICENSE for details]
# Written by Bharath Hariharan
# --------------------------------------------------------
"""Python implementation of the PASCAL VOC devkit's AP evaluation code."""
import cPickle
import logging
import numpy as np
import os
import xml.etree.ElementTree as ET
logger = logging.getLogger(__name__)
def parse_rec(filename):
"""Parse a PASCAL VOC xml file."""
tree = ET.parse(filename)
objects = []
for obj in tree.findall('object'):
obj_struct = {}
obj_struct['name'] = obj.find('name').text
obj_struct['pose'] = obj.find('pose').text
obj_struct['truncated'] = int(obj.find('truncated').text)
obj_struct['difficult'] = int(obj.find('difficult').text)
bbox = obj.find('bndbox')
obj_struct['bbox'] = [int(bbox.find('xmin').text),
int(bbox.find('ymin').text),
int(bbox.find('xmax').text),
int(bbox.find('ymax').text)]
objects.append(obj_struct)
return objects
def voc_ap(rec, prec, use_07_metric=False):
"""Compute VOC AP given precision and recall. If use_07_metric is true, uses
the VOC 07 11-point method (default:False).
"""
if use_07_metric:
# 11 point metric
ap = 0.
for t in np.arange(0., 1.1, 0.1):
if np.sum(rec >= t) == 0:
p = 0
else:
p = np.max(prec[rec >= t])
ap = ap + p / 11.
else:
# correct AP calculation
# first append sentinel values at the end
mrec = np.concatenate(([0.], rec, [1.]))
mpre = np.concatenate(([0.], prec, [0.]))
# compute the precision envelope
for i in range(mpre.size - 1, 0, -1):
mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
# to calculate area under PR curve, look for points
# where X axis (recall) changes value
i = np.where(mrec[1:] != mrec[:-1])[0]
# and sum (\Delta recall) * prec
ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
return ap
def voc_eval(detpath,
annopath,
imagesetfile,
classname,
cachedir,
ovthresh=0.5,
use_07_metric=False):
"""rec, prec, ap = voc_eval(detpath,
annopath,
imagesetfile,
classname,
[ovthresh],
[use_07_metric])
Top level function that does the PASCAL VOC evaluation.
detpath: Path to detections
detpath.format(classname) should produce the detection results file.
annopath: Path to annotations
annopath.format(imagename) should be the xml annotations file.
imagesetfile: Text file containing the list of images, one image per line.
classname: Category name (duh)
cachedir: Directory for caching the annotations
[ovthresh]: Overlap threshold (default = 0.5)
[use_07_metric]: Whether to use VOC07's 11 point AP computation
(default False)
"""
# assumes detections are in detpath.format(classname)
# assumes annotations are in annopath.format(imagename)
# assumes imagesetfile is a text file with each line an image name
# cachedir caches the annotations in a pickle file
# first load gt
if not os.path.isdir(cachedir):
os.mkdir(cachedir)
imageset = os.path.splitext(os.path.basename(imagesetfile))[0]
cachefile = os.path.join(cachedir, imageset + '_annots.pkl')
# read list of images
with open(imagesetfile, 'r') as f:
lines = f.readlines()
imagenames = [x.strip() for x in lines]
if not os.path.isfile(cachefile):
# load annots
recs = {}
for i, imagename in enumerate(imagenames):
recs[imagename] = parse_rec(annopath.format(imagename))
if i % 100 == 0:
logger.info(
'Reading annotation for {:d}/{:d}'.format(
i + 1, len(imagenames)))
# save
logger.info('Saving cached annotations to {:s}'.format(cachefile))
with open(cachefile, 'w') as f:
cPickle.dump(recs, f)
else:
# load
with open(cachefile, 'r') as f:
recs = cPickle.load(f)
# extract gt objects for this class
class_recs = {}
npos = 0
for imagename in imagenames:
R = [obj for obj in recs[imagename] if obj['name'] == classname]
bbox = np.array([x['bbox'] for x in R])
difficult = np.array([x['difficult'] for x in R]).astype(np.bool)
det = [False] * len(R)
npos = npos + sum(~difficult)
class_recs[imagename] = {'bbox': bbox,
'difficult': difficult,
'det': det}
# read dets
detfile = detpath.format(classname)
with open(detfile, 'r') as f:
lines = f.readlines()
splitlines = [x.strip().split(' ') for x in lines]
image_ids = [x[0] for x in splitlines]
confidence = np.array([float(x[1]) for x in splitlines])
BB = np.array([[float(z) for z in x[2:]] for x in splitlines])
# sort by confidence
sorted_ind = np.argsort(-confidence)
BB = BB[sorted_ind, :]
image_ids = [image_ids[x] for x in sorted_ind]
# go down dets and mark TPs and FPs
nd = len(image_ids)
tp = np.zeros(nd)
fp = np.zeros(nd)
for d in range(nd):
R = class_recs[image_ids[d]]
bb = BB[d, :].astype(float)
ovmax = -np.inf
BBGT = R['bbox'].astype(float)
if BBGT.size > 0:
# compute overlaps
# intersection
ixmin = np.maximum(BBGT[:, 0], bb[0])
iymin = np.maximum(BBGT[:, 1], bb[1])
ixmax = np.minimum(BBGT[:, 2], bb[2])
iymax = np.minimum(BBGT[:, 3], bb[3])
iw = np.maximum(ixmax - ixmin + 1., 0.)
ih = np.maximum(iymax - iymin + 1., 0.)
inters = iw * ih
# union
uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
(BBGT[:, 2] - BBGT[:, 0] + 1.) *
(BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)
overlaps = inters / uni
ovmax = np.max(overlaps)
jmax = np.argmax(overlaps)
if ovmax > ovthresh:
if not R['difficult'][jmax]:
if not R['det'][jmax]:
tp[d] = 1.
R['det'][jmax] = 1
else:
fp[d] = 1.
else:
fp[d] = 1.
# compute precision recall
fp = np.cumsum(fp)
tp = np.cumsum(tp)
rec = tp / float(npos)
# avoid divide by zero in case the first detection matches a difficult
# ground truth
prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
ap = voc_ap(rec, prec, use_07_metric)
return rec, prec, ap
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Functions for using a Feature Pyramid Network (FPN)."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import collections
import numpy as np
from core.config import cfg
from modeling.generate_anchors import generate_anchors
from utils.c2 import const_fill
from utils.c2 import gauss_fill
import modeling.ResNet as ResNet
import utils.blob as blob_utils
import utils.boxes as box_utils
# Lowest and highest pyramid levels in the backbone network. For FPN, we assume
# that all networks have 5 spatial reductions, each by a factor of 2. Level 1
# would correspond to the input image, hence it does not make sense to use it.
LOWEST_BACKBONE_LVL = 2 # E.g., "conv2"-like level
HIGHEST_BACKBONE_LVL = 5 # E.g., "conv5"-like level
# ---------------------------------------------------------------------------- #
# FPN with ResNet
# ---------------------------------------------------------------------------- #
def add_fpn_ResNet50_conv5_body(model):
return add_fpn_onto_conv_body(
model, ResNet.add_ResNet50_conv5_body, fpn_level_info_ResNet50_conv5
)
def add_fpn_ResNet50_conv5_P2only_body(model):
return add_fpn_onto_conv_body(
model,
ResNet.add_ResNet50_conv5_body,
fpn_level_info_ResNet50_conv5,
P2only=True
)
def add_fpn_ResNet101_conv5_body(model):
return add_fpn_onto_conv_body(
model, ResNet.add_ResNet101_conv5_body, fpn_level_info_ResNet101_conv5
)
def add_fpn_ResNet101_conv5_P2only_body(model):
return add_fpn_onto_conv_body(
model,
ResNet.add_ResNet101_conv5_body,
fpn_level_info_ResNet101_conv5,
P2only=True
)
def add_fpn_ResNet152_conv5_body(model):
return add_fpn_onto_conv_body(
model, ResNet.add_ResNet152_conv5_body, fpn_level_info_ResNet152_conv5
)
def add_fpn_ResNet152_conv5_P2only_body(model):
return add_fpn_onto_conv_body(
model,
ResNet.add_ResNet152_conv5_body,
fpn_level_info_ResNet152_conv5,
P2only=True
)
# ---------------------------------------------------------------------------- #
# Functions for bolting FPN onto a backbone architectures
# ---------------------------------------------------------------------------- #
def add_fpn_onto_conv_body(
model, conv_body_func, fpn_level_info_func, P2only=False
):
"""Add the specified conv body to the model and then add FPN levels to it.
"""
# Note: blobs_conv is in revsersed order: [fpn5, fpn4, fpn3, fpn2]
# similarly for dims_conv: [2048, 1024, 512, 256]
# similarly for spatial_scales_fpn: [1/32, 1/16, 1/8, 1/4]
conv_body_func(model)
blobs_fpn, dim_fpn, spatial_scales_fpn = add_fpn(
model, fpn_level_info_func()
)
if P2only:
# use only the finest level
return blobs_fpn[-1], dim_fpn, spatial_scales_fpn[-1]
else:
# use all levels
return blobs_fpn, dim_fpn, spatial_scales_fpn
def add_fpn(model, fpn_level_info):
"""Add FPN connections based on the model described in the FPN paper."""
# FPN levels are built starting from the highest/coarest level of the
# backbone (usually "conv5"). First we build down, recursively constructing
# lower/finer resolution FPN levels. Then we build up, constructing levels
# that are even higher/coarser than the starting level.
fpn_dim = cfg.FPN.DIM
min_level, max_level = get_min_max_levels()
# Count the number of backbone stages that we will generate FPN levels for
# starting from the coarest backbone stage (usually the "conv5"-like level)
# E.g., if the backbone level info defines stages 4 stages: "conv5",
# "conv4", ... "conv2" and min_level=2, then we end up with 4 - (2 - 2) = 4
# backbone stages to add FPN to.
num_backbone_stages = (
len(fpn_level_info.blobs) - (min_level - LOWEST_BACKBONE_LVL)
)
lateral_input_blobs = fpn_level_info.blobs[:num_backbone_stages]
output_blobs = [
'fpn_inner_{}'.format(s)
for s in fpn_level_info.blobs[:num_backbone_stages]
]
fpn_dim_lateral = fpn_level_info.dims
xavier_fill = ('XavierFill', {})
# For the coarest backbone level: 1x1 conv only seeds recursion
model.Conv(
lateral_input_blobs[0],
output_blobs[0],
dim_in=fpn_dim_lateral[0],
dim_out=fpn_dim,
kernel=1,
pad=0,
stride=1,
weight_init=xavier_fill,
bias_init=const_fill(0.0)
)
#
# Step 1: recursively build down starting from the coarsest backbone level
#
# For other levels add top-down and lateral connections
for i in range(num_backbone_stages - 1):
add_topdown_lateral_module(
model,
output_blobs[i], # top-down blob
lateral_input_blobs[i + 1], # lateral blob
output_blobs[i + 1], # next output blob
fpn_dim, # output dimension
fpn_dim_lateral[i + 1] # lateral input dimension
)
# Post-hoc scale-specific 3x3 convs
blobs_fpn = []
spatial_scales = []
for i in range(num_backbone_stages):
fpn_blob = model.Conv(
output_blobs[i],
'fpn_{}'.format(fpn_level_info.blobs[i]),
dim_in=fpn_dim,
dim_out=fpn_dim,
kernel=3,
pad=1,
stride=1,
weight_init=xavier_fill,
bias_init=const_fill(0.0)
)
blobs_fpn += [fpn_blob]
spatial_scales += [fpn_level_info.spatial_scales[i]]
#
# Step 2: build up starting from the coarsest backbone level
#
# Check if we need the P6 feature map
if not cfg.FPN.EXTRA_CONV_LEVELS and max_level == HIGHEST_BACKBONE_LVL + 1:
# Original FPN P6 level implementation from our CVPR'17 FPN paper
P6_blob_in = blobs_fpn[0]
P6_name = P6_blob_in + '_subsampled_2x'
# Use max pooling to simulate stride 2 subsampling
P6_blob = model.MaxPool(P6_blob_in, P6_name, kernel=1, pad=0, stride=2)
blobs_fpn.insert(0, P6_blob)
spatial_scales.insert(0, spatial_scales[0] * 0.5)
# Coarser FPN levels introduced for RetinaNet
if cfg.FPN.EXTRA_CONV_LEVELS and max_level > HIGHEST_BACKBONE_LVL:
fpn_blob = fpn_level_info.blobs[0]
dim_in = fpn_level_info.dims[0]
for i in range(HIGHEST_BACKBONE_LVL + 1, max_level + 1):
fpn_blob_in = fpn_blob
if i > HIGHEST_BACKBONE_LVL + 1:
fpn_blob_in = model.Relu(fpn_blob, fpn_blob + '_relu')
fpn_blob = model.Conv(
fpn_blob_in,
'fpn_' + str(i),
dim_in=dim_in,
dim_out=fpn_dim,
kernel=3,
pad=1,
stride=2,
weight_init=xavier_fill,
bias_init=const_fill(0.0)
)
dim_in = fpn_dim
blobs_fpn.insert(0, fpn_blob)
spatial_scales.insert(0, spatial_scales[0] * 0.5)
return blobs_fpn, fpn_dim, spatial_scales
def add_topdown_lateral_module(
model, fpn_top, fpn_lateral, fpn_bottom, dim_top, dim_lateral
):
"""Add a top-down lateral module."""
# Lateral 1x1 conv
lat = model.Conv(
fpn_lateral,
fpn_bottom + '_lateral',
dim_in=dim_lateral,
dim_out=dim_top,
kernel=1,
pad=0,
stride=1,
weight_init=(
const_fill(0.0) if cfg.FPN.ZERO_INIT_LATERAL
else ('XavierFill', {})
),
bias_init=const_fill(0.0)
)
# Top-down 2x upsampling
td = model.net.UpsampleNearest(fpn_top, fpn_bottom + '_topdown', scale=2)
# Sum lateral and top-down
model.net.Sum([lat, td], fpn_bottom)
def get_min_max_levels():
"""The min and max FPN levels required for supporting RPN and/or RoI
transform operations on multiple FPN levels.
"""
min_level = LOWEST_BACKBONE_LVL
max_level = HIGHEST_BACKBONE_LVL
if cfg.FPN.MULTILEVEL_RPN and not cfg.FPN.MULTILEVEL_ROIS:
max_level = cfg.FPN.RPN_MAX_LEVEL
min_level = cfg.FPN.RPN_MIN_LEVEL
if not cfg.FPN.MULTILEVEL_RPN and cfg.FPN.MULTILEVEL_ROIS:
max_level = cfg.FPN.ROI_MAX_LEVEL
min_level = cfg.FPN.ROI_MIN_LEVEL
if cfg.FPN.MULTILEVEL_RPN and cfg.FPN.MULTILEVEL_ROIS:
max_level = max(cfg.FPN.RPN_MAX_LEVEL, cfg.FPN.ROI_MAX_LEVEL)
min_level = min(cfg.FPN.RPN_MIN_LEVEL, cfg.FPN.ROI_MIN_LEVEL)
return min_level, max_level
# ---------------------------------------------------------------------------- #
# RPN with an FPN backbone
# ---------------------------------------------------------------------------- #
def add_fpn_rpn_outputs(model, blobs_in, dim_in, spatial_scales):
"""Add RPN on FPN specific outputs."""
num_anchors = len(cfg.FPN.RPN_ASPECT_RATIOS)
dim_out = dim_in
k_max = cfg.FPN.RPN_MAX_LEVEL # coarsest level of pyramid
k_min = cfg.FPN.RPN_MIN_LEVEL # finest level of pyramid
assert len(blobs_in) == k_max - k_min + 1
for lvl in range(k_min, k_max + 1):
bl_in = blobs_in[k_max - lvl] # blobs_in is in reversed order
sc = spatial_scales[k_max - lvl] # in reversed order
slvl = str(lvl)
if lvl == k_min:
# Create conv ops with randomly initialized weights and
# zeroed biases for the first FPN level; these will be shared by
# all other FPN levels
# RPN hidden representation
conv_rpn_fpn = model.Conv(
bl_in,
'conv_rpn_fpn' + slvl,
dim_in,
dim_out,
kernel=3,
pad=1,
stride=1,
weight_init=gauss_fill(0.01),
bias_init=const_fill(0.0)
)
model.Relu(conv_rpn_fpn, conv_rpn_fpn)
# Proposal classification scores
rpn_cls_logits_fpn = model.Conv(
conv_rpn_fpn,
'rpn_cls_logits_fpn' + slvl,
dim_in,
num_anchors,
kernel=1,
pad=0,
stride=1,
weight_init=gauss_fill(0.01),
bias_init=const_fill(0.0)
)
# Proposal bbox regression deltas
rpn_bbox_pred_fpn = model.Conv(
conv_rpn_fpn,
'rpn_bbox_pred_fpn' + slvl,
dim_in,
4 * num_anchors,
kernel=1,
pad=0,
stride=1,
weight_init=gauss_fill(0.01),
bias_init=const_fill(0.0)
)
else:
# Share weights and biases
sk_min = str(k_min)
# RPN hidden representation
conv_rpn_fpn = model.ConvShared(
bl_in,
'conv_rpn_fpn' + slvl,
dim_in,
dim_out,
kernel=3,
pad=1,
stride=1,
weight='conv_rpn_fpn' + sk_min + '_w',
bias='conv_rpn_fpn' + sk_min + '_b'
)
model.Relu(conv_rpn_fpn, conv_rpn_fpn)
# Proposal classification scores
rpn_cls_logits_fpn = model.ConvShared(
conv_rpn_fpn,
'rpn_cls_logits_fpn' + slvl,
dim_in,
num_anchors,
kernel=1,
pad=0,
stride=1,
weight='rpn_cls_logits_fpn' + sk_min + '_w',
bias='rpn_cls_logits_fpn' + sk_min + '_b'
)
# Proposal bbox regression deltas
rpn_bbox_pred_fpn = model.ConvShared(
conv_rpn_fpn,
'rpn_bbox_pred_fpn' + slvl,
dim_in,
4 * num_anchors,
kernel=1,
pad=0,
stride=1,
weight='rpn_bbox_pred_fpn' + sk_min + '_w',
bias='rpn_bbox_pred_fpn' + sk_min + '_b'
)
if not model.train or cfg.MODEL.FASTER_RCNN:
# Proposals are needed during:
# 1) inference (== not model.train) for RPN only and Faster R-CNN
# OR
# 2) training for Faster R-CNN
# Otherwise (== training for RPN only), proposals are not needed
lvl_anchors = generate_anchors(
stride=2.**lvl,
sizes=(cfg.FPN.RPN_ANCHOR_START_SIZE * 2.**(lvl - k_min), ),
aspect_ratios=cfg.FPN.RPN_ASPECT_RATIOS
)
rpn_cls_probs_fpn = model.net.Sigmoid(
rpn_cls_logits_fpn, 'rpn_cls_probs_fpn' + slvl
)
model.GenerateProposals(
[rpn_cls_probs_fpn, rpn_bbox_pred_fpn, 'im_info'],
['rpn_rois_fpn' + slvl, 'rpn_roi_probs_fpn' + slvl],
anchors=lvl_anchors,
spatial_scale=sc
)
def add_fpn_rpn_losses(model):
"""Add RPN on FPN specific losses."""
loss_gradients = {}
for lvl in range(cfg.FPN.RPN_MIN_LEVEL, cfg.FPN.RPN_MAX_LEVEL + 1):
slvl = str(lvl)
# Spatially narrow the full-sized RPN label arrays to match the feature map
# shape
model.net.SpatialNarrowAs(
['rpn_labels_int32_wide_fpn' + slvl, 'rpn_cls_logits_fpn' + slvl],
'rpn_labels_int32_fpn' + slvl
)
for key in ('targets', 'inside_weights', 'outside_weights'):
model.net.SpatialNarrowAs(
[
'rpn_bbox_' + key + '_wide_fpn' + slvl,
'rpn_bbox_pred_fpn' + slvl
],
'rpn_bbox_' + key + '_fpn' + slvl
)
loss_rpn_cls_fpn = model.net.SigmoidCrossEntropyLoss(
['rpn_cls_logits_fpn' + slvl, 'rpn_labels_int32_fpn' + slvl],
'loss_rpn_cls_fpn' + slvl,
normalize=0,
scale=(
1. / cfg.NUM_GPUS / cfg.TRAIN.RPN_BATCH_SIZE_PER_IM /
cfg.TRAIN.IMS_PER_BATCH
)
)
# Normalization by (1) RPN_BATCH_SIZE_PER_IM and (2) IMS_PER_BATCH is
# handled by (1) setting bbox outside weights and (2) SmoothL1Loss
# normalizes by IMS_PER_BATCH
loss_rpn_bbox_fpn = model.net.SmoothL1Loss(
[
'rpn_bbox_pred_fpn' + slvl, 'rpn_bbox_targets_fpn' + slvl,
'rpn_bbox_inside_weights_fpn' + slvl,
'rpn_bbox_outside_weights_fpn' + slvl
],
'loss_rpn_bbox_fpn' + slvl,
beta=1. / 9.,
scale=1. / cfg.NUM_GPUS
)
loss_gradients.update(
blob_utils.
get_loss_gradients(model, [loss_rpn_cls_fpn, loss_rpn_bbox_fpn])
)
model.AddLosses(['loss_rpn_cls_fpn' + slvl, 'loss_rpn_bbox_fpn' + slvl])
return loss_gradients
# ---------------------------------------------------------------------------- #
# Helper functions for working with multilevel FPN RoIs
# ---------------------------------------------------------------------------- #
def map_rois_to_fpn_levels(rois, k_min, k_max):
"""Determine which FPN level each RoI in a set of RoIs should map to based
on the heuristic in the FPN paper.
"""
# Compute level ids
s = np.sqrt(box_utils.boxes_area(rois))
s0 = cfg.FPN.ROI_CANONICAL_SCALE # default: 224
lvl0 = cfg.FPN.ROI_CANONICAL_LEVEL # default: 4
# Eqn.(1) in FPN paper
target_lvls = np.floor(lvl0 + np.log2(s / s0 + 1e-6))
target_lvls = np.clip(target_lvls, k_min, k_max)
return target_lvls
def add_multilevel_roi_blobs(
blobs, blob_prefix, rois, target_lvls, lvl_min, lvl_max
):
"""Add RoI blobs for multiple FPN levels to the blobs dict.
blobs: a dict mapping from blob name to numpy ndarray
blob_prefix: name prefix to use for the FPN blobs
rois: the source rois as a 2D numpy array of shape (N, 5) where each row is
an roi and the columns encode (batch_idx, x1, y1, x2, y2)
target_lvls: numpy array of shape (N, ) indicating which FPN level each roi
in rois should be assigned to
lvl_min: the finest (highest resolution) FPN level (e.g., 2)
lvl_max: the coarest (lowest resolution) FPN level (e.g., 6)
"""
rois_idx_order = np.empty((0, ))
rois_stacked = np.zeros((0, 5), dtype=np.float32) # for assert
for lvl in range(lvl_min, lvl_max + 1):
idx_lvl = np.where(target_lvls == lvl)[0]
blobs[blob_prefix + '_fpn' + str(lvl)] = rois[idx_lvl, :]
rois_idx_order = np.concatenate((rois_idx_order, idx_lvl))
rois_stacked = np.vstack(
[rois_stacked, blobs[blob_prefix + '_fpn' + str(lvl)]]
)
rois_idx_restore = np.argsort(rois_idx_order).astype(np.int32, copy=False)
blobs[blob_prefix + '_idx_restore_int32'] = rois_idx_restore
# Sanity check that restore order is correct
assert (rois_stacked[rois_idx_restore] == rois).all()
# ---------------------------------------------------------------------------- #
# FPN level info for stages 5, 4, 3, 2 for select models (more can be added)
# ---------------------------------------------------------------------------- #
FpnLevelInfo = collections.namedtuple(
'FpnLevelInfo',
['blobs', 'dims', 'spatial_scales']
)
def fpn_level_info_ResNet50_conv5():
return FpnLevelInfo(
blobs=('res5_2_sum', 'res4_5_sum', 'res3_3_sum', 'res2_2_sum'),
dims=(2048, 1024, 512, 256),
spatial_scales=(1. / 32., 1. / 16., 1. / 8., 1. / 4.)
)
def fpn_level_info_ResNet101_conv5():
return FpnLevelInfo(
blobs=('res5_2_sum', 'res4_22_sum', 'res3_3_sum', 'res2_2_sum'),
dims=(2048, 1024, 512, 256),
spatial_scales=(1. / 32., 1. / 16., 1. / 8., 1. / 4.)
)
def fpn_level_info_ResNet152_conv5():
return FpnLevelInfo(
blobs=('res5_2_sum', 'res4_35_sum', 'res3_7_sum', 'res2_2_sum'),
dims=(2048, 1024, 512, 256),
spatial_scales=(1. / 32., 1. / 16., 1. / 8., 1. / 4.)
)
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Implements ResNet and ResNeXt.
See: https://arxiv.org/abs/1512.03385, https://arxiv.org/abs/1611.05431.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from core.config import cfg
# ---------------------------------------------------------------------------- #
# Bits for specific architectures (ResNet50, ResNet101, ...)
# ---------------------------------------------------------------------------- #
def add_ResNet50_conv4_body(model):
return add_ResNet_convX_body(model, (3, 4, 6))
def add_ResNet50_conv5_body(model):
return add_ResNet_convX_body(model, (3, 4, 6, 3))
def add_ResNet101_conv4_body(model):
return add_ResNet_convX_body(model, (3, 4, 23))
def add_ResNet101_conv5_body(model):
return add_ResNet_convX_body(model, (3, 4, 23, 3))
def add_ResNet152_conv5_body(model):
return add_ResNet_convX_body(model, (3, 8, 36, 3))
# ---------------------------------------------------------------------------- #
# Generic ResNet components
# ---------------------------------------------------------------------------- #
def add_stage(
model,
prefix,
blob_in,
n,
dim_in,
dim_out,
dim_inner,
dilation,
stride_init=2
):
"""Add a ResNet stage to the model by stacking n residual blocks."""
# e.g., prefix = res2
for i in range(n):
blob_in = add_residual_block(
model,
'{}_{}'.format(prefix, i),
blob_in,
dim_in,
dim_out,
dim_inner,
dilation,
stride_init,
# Not using inplace for the last block;
# it may be fetched externally or used by FPN
inplace_sum=i < n - 1
)
dim_in = dim_out
return blob_in, dim_in
def add_ResNet_convX_body(model, block_counts, freeze_at=2):
"""Add a ResNet body from input data up through the res5 (aka conv5) stage.
The final res5/conv5 stage may be optionally excluded (hence convX, where
X = 4 or 5)."""
assert freeze_at in [0, 2, 3, 4, 5]
p = model.Conv('data', 'conv1', 3, 64, 7, pad=3, stride=2, no_bias=1)
p = model.AffineChannel(p, 'res_conv1_bn', inplace=True)
p = model.Relu(p, p)
p = model.MaxPool(p, 'pool1', kernel=3, pad=1, stride=2)
dim_in = 64
dim_bottleneck = cfg.RESNETS.NUM_GROUPS * cfg.RESNETS.WIDTH_PER_GROUP
(n1, n2, n3) = block_counts[:3]
s, dim_in = add_stage(model, 'res2', p, n1, dim_in, 256, dim_bottleneck, 1)
if freeze_at == 2:
model.StopGradient(s, s)
s, dim_in = add_stage(
model, 'res3', s, n2, dim_in, 512, dim_bottleneck * 2, 1
)
if freeze_at == 3:
model.StopGradient(s, s)
s, dim_in = add_stage(
model, 'res4', s, n3, dim_in, 1024, dim_bottleneck * 4, 1
)
if freeze_at == 4:
model.StopGradient(s, s)
if len(block_counts) == 4:
n4 = block_counts[3]
s, dim_in = add_stage(
model, 'res5', s, n4, dim_in, 2048, dim_bottleneck * 8,
cfg.RESNETS.RES5_DILATION
)
if freeze_at == 5:
model.StopGradient(s, s)
return s, dim_in, 1. / 32. * cfg.RESNETS.RES5_DILATION
else:
return s, dim_in, 1. / 16.
def add_ResNet_roi_conv5_head(model, blob_in, dim_in, spatial_scale):
"""Adds an RoI feature transformation (e.g., RoI pooling) followed by a
res5/conv5 head applied to each RoI."""
# TODO(rbg): This contains Fast R-CNN specific config options making it non-
# reusable; make this more generic with model-specific wrappers
model.RoIFeatureTransform(
blob_in,
'pool5',
blob_rois='rois',
method=cfg.FAST_RCNN.ROI_XFORM_METHOD,
resolution=cfg.FAST_RCNN.ROI_XFORM_RESOLUTION,
sampling_ratio=cfg.FAST_RCNN.ROI_XFORM_SAMPLING_RATIO,
spatial_scale=spatial_scale
)
dim_bottleneck = cfg.RESNETS.NUM_GROUPS * cfg.RESNETS.WIDTH_PER_GROUP
stride_init = int(cfg.FAST_RCNN.ROI_XFORM_RESOLUTION / 7)
s, dim_in = add_stage(
model, 'res5', 'pool5', 3, dim_in, 2048, dim_bottleneck * 8, 1,
stride_init
)
s = model.AveragePool(s, 'res5_pool', kernel=7)
return s, 2048
def add_residual_block(
model,
prefix,
blob_in,
dim_in,
dim_out,
dim_inner,
dilation,
stride_init=2,
inplace_sum=False
):
"""Add a residual block to the model."""
# prefix = res<stage>_<sub_stage>, e.g., res2_3
# Max pooling is performed prior to the first stage (which is uniquely
# distinguished by dim_in = 64), thus we keep stride = 1 for the first stage
stride = stride_init if (
dim_in != dim_out and dim_in != 64 and dilation == 1
) else 1
# transformation blob
tr = globals()[cfg.RESNETS.TRANS_FUNC](
model,
blob_in,
dim_in,
dim_out,
stride,
prefix,
dim_inner,
group=cfg.RESNETS.NUM_GROUPS,
dilation=dilation
)
# sum -> ReLU
sc = add_shortcut(model, prefix, blob_in, dim_in, dim_out, stride)
if inplace_sum:
s = model.net.Sum([tr, sc], tr)
else:
s = model.net.Sum([tr, sc], prefix + '_sum')
return model.Relu(s, s)
def add_shortcut(model, prefix, blob_in, dim_in, dim_out, stride):
if dim_in == dim_out:
return blob_in
c = model.Conv(
blob_in,
prefix + '_branch1',
dim_in,
dim_out,
kernel=1,
stride=stride,
no_bias=1
)
return model.AffineChannel(c, prefix + '_branch1_bn')
# ------------------------------------------------------------------------------
# various transformations (may expand and may consider a new helper)
# ------------------------------------------------------------------------------
def bottleneck_transformation(
model,
blob_in,
dim_in,
dim_out,
stride,
prefix,
dim_inner,
dilation=1,
group=1
):
"""Add a bottleneck transformation to the model."""
# In original resnet, stride=2 is on 1x1.
# In fb.torch resnet, stride=2 is on 3x3.
(str1x1, str3x3) = (stride, 1) if cfg.RESNETS.STRIDE_1X1 else (1, stride)
# conv 1x1 -> BN -> ReLU
cur = model.ConvAffine(
blob_in,
prefix + '_branch2a',
dim_in,
dim_inner,
kernel=1,
stride=str1x1,
pad=0,
inplace=True
)
cur = model.Relu(cur, cur)
# conv 3x3 -> BN -> ReLU
cur = model.ConvAffine(
cur,
prefix + '_branch2b',
dim_inner,
dim_inner,
kernel=3,
stride=str3x3,
pad=1 * dilation,
dilation=dilation,
group=group,
inplace=True
)
cur = model.Relu(cur, cur)
# conv 1x1 -> BN (no ReLU)
# NB: for now this AffineChannel op cannot be in-place due to a bug in C2
# gradient computation for graphs like this
cur = model.ConvAffine(
cur,
prefix + '_branch2c',
dim_inner,
dim_out,
kernel=1,
stride=1,
pad=0,
inplace=False
)
return cur
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""VGG16 from https://arxiv.org/abs/1409.1556."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from core.config import cfg
def add_VGG16_conv5_body(model):
model.Conv('data', 'conv1_1', 3, 64, 3, pad=1, stride=1)
model.Relu('conv1_1', 'conv1_1')
model.Conv('conv1_1', 'conv1_2', 64, 64, 3, pad=1, stride=1)
model.Relu('conv1_2', 'conv1_2')
model.MaxPool('conv1_2', 'pool1', kernel=2, pad=0, stride=2)
model.Conv('pool1', 'conv2_1', 64, 128, 3, pad=1, stride=1)
model.Relu('conv2_1', 'conv2_1')
model.Conv('conv2_1', 'conv2_2', 128, 128, 3, pad=1, stride=1)
model.Relu('conv2_2', 'conv2_2')
model.MaxPool('conv2_2', 'pool2', kernel=2, pad=0, stride=2)
model.StopGradient('pool2', 'pool2')
model.Conv('pool2', 'conv3_1', 128, 256, 3, pad=1, stride=1)
model.Relu('conv3_1', 'conv3_1')
model.Conv('conv3_1', 'conv3_2', 256, 256, 3, pad=1, stride=1)
model.Relu('conv3_2', 'conv3_2')
model.Conv('conv3_2', 'conv3_3', 256, 256, 3, pad=1, stride=1)
model.Relu('conv3_3', 'conv3_3')
model.MaxPool('conv3_3', 'pool3', kernel=2, pad=0, stride=2)
model.Conv('pool3', 'conv4_1', 256, 512, 3, pad=1, stride=1)
model.Relu('conv4_1', 'conv4_1')
model.Conv('conv4_1', 'conv4_2', 512, 512, 3, pad=1, stride=1)
model.Relu('conv4_2', 'conv4_2')
model.Conv('conv4_2', 'conv4_3', 512, 512, 3, pad=1, stride=1)
model.Relu('conv4_3', 'conv4_3')
model.MaxPool('conv4_3', 'pool4', kernel=2, pad=0, stride=2)
model.Conv('pool4', 'conv5_1', 512, 512, 3, pad=1, stride=1)
model.Relu('conv5_1', 'conv5_1')
model.Conv('conv5_1', 'conv5_2', 512, 512, 3, pad=1, stride=1)
model.Relu('conv5_2', 'conv5_2')
model.Conv('conv5_2', 'conv5_3', 512, 512, 3, pad=1, stride=1)
blob_out = model.Relu('conv5_3', 'conv5_3')
return blob_out, 512, 1. / 16.
def add_VGG16_roi_fc_head(model, blob_in, dim_in, spatial_scale):
model.RoIFeatureTransform(
blob_in,
'pool5',
blob_rois='rois',
method=cfg.FAST_RCNN.ROI_XFORM_METHOD,
resolution=7,
sampling_ratio=cfg.FAST_RCNN.ROI_XFORM_SAMPLING_RATIO,
spatial_scale=spatial_scale
)
model.FC('pool5', 'fc6', dim_in * 7 * 7, 4096)
model.Relu('fc6', 'fc6')
model.FC('fc6', 'fc7', 4096, 4096)
blob_out = model.Relu('fc7', 'fc7')
return blob_out, 4096
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""VGG_CNN_M_1024 from https://arxiv.org/abs/1405.3531."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from core.config import cfg
def add_VGG_CNN_M_1024_conv5_body(model):
model.Conv('data', 'conv1', 3, 96, 7, pad=0, stride=2)
model.Relu('conv1', 'conv1')
model.LRN('conv1', 'norm1', size=5, alpha=0.0005, beta=0.75, bias=2.)
model.MaxPool('norm1', 'pool1', kernel=3, pad=0, stride=2)
model.StopGradient('pool1', 'pool1')
# No updates at conv1 and below (norm1 and pool1 have no params,
# so we can stop gradients before them, too)
model.Conv('pool1', 'conv2', 96, 256, 5, pad=0, stride=2)
model.Relu('conv2', 'conv2')
model.LRN('conv2', 'norm2', size=5, alpha=0.0005, beta=0.75, bias=2.)
model.MaxPool('norm2', 'pool2', kernel=3, pad=0, stride=2)
model.Conv('pool2', 'conv3', 256, 512, 3, pad=1, stride=1)
model.Relu('conv3', 'conv3')
model.Conv('conv3', 'conv4', 512, 512, 3, pad=1, stride=1)
model.Relu('conv4', 'conv4')
model.Conv('conv4', 'conv5', 512, 512, 3, pad=1, stride=1)
blob_out = model.Relu('conv5', 'conv5')
return blob_out, 512, 1. / 16.
def add_VGG_CNN_M_1024_roi_fc_head(model, blob_in, dim_in, spatial_scale):
model.RoIFeatureTransform(
blob_in,
'pool5',
blob_rois='rois',
method=cfg.FAST_RCNN.ROI_XFORM_METHOD,
resolution=6,
sampling_ratio=cfg.FAST_RCNN.ROI_XFORM_SAMPLING_RATIO,
spatial_scale=spatial_scale
)
model.FC('pool5', 'fc6', dim_in * 6 * 6, 4096)
model.Relu('fc6', 'fc6')
model.FC('fc6', 'fc7', 4096, 1024)
blob_out = model.Relu('fc7', 'fc7')
return blob_out, 1024
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Defines DetectionModelHelper, the class that represents a Detectron model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
import logging
from caffe2.python import cnn
from caffe2.python import core
from caffe2.python import workspace
from core.config import cfg
from ops.collect_and_distribute_fpn_rpn_proposals \
import CollectAndDistributeFpnRpnProposalsOp
from ops.generate_proposal_labels import GenerateProposalLabelsOp
from ops.generate_proposals import GenerateProposalsOp
from utils import lr_policy
import roi_data.fast_rcnn
import utils.c2 as c2_utils
logger = logging.getLogger(__name__)
class DetectionModelHelper(cnn.CNNModelHelper):
def __init__(self, **kwargs):
# Handle args specific to the DetectionModelHelper, others pass through
# to CNNModelHelper
self.train = kwargs.get('train', False)
self.num_classes = kwargs.get('num_classes', -1)
assert self.num_classes > 0, 'num_classes must be > 0'
for k in ('train', 'num_classes'):
if k in kwargs:
del kwargs[k]
kwargs['order'] = 'NCHW'
# Defensively set cudnn_exhaustive_search to False in case the default
# changes in CNNModelHelper. The detection code uses variable size
# inputs that might not play nicely with cudnn_exhaustive_search.
kwargs['cudnn_exhaustive_search'] = False
super(DetectionModelHelper, self).__init__(**kwargs)
self.roi_data_loader = None
self.losses = []
self.metrics = []
self.do_not_update_params = [] # Param on this list are not updated
self.net.Proto().type = cfg.MODEL.EXECUTION_TYPE
self.net.Proto().num_workers = cfg.NUM_GPUS * 4
self.prev_use_cudnn = self.use_cudnn
def TrainableParams(self, gpu_id=-1):
"""Get the blob names for all trainable parameters, possibly filtered by
GPU id.
"""
return [
p for p in self.params
if (
p in self.param_to_grad and # p has a gradient
p not in self.do_not_update_params and # not on the blacklist
(gpu_id == -1 or # filter for gpu assignment, if gpu_id set
str(p).find('gpu_{}'.format(gpu_id)) == 0)
)]
def AffineChannel(self, blob_in, blob_out, share_with=None, inplace=False):
"""Affine transformation to replace BN in networks where BN cannot be
used (e.g., because the minibatch size is too small).
The AffineChannel parameters may be shared with another AffineChannelOp
by specifying its blob name (excluding the '_{s,b}' suffix) in the
share_with argument. The operations can be done in place to save memory.
"""
blob_out = blob_out or self.net.NextName()
is_not_sharing = share_with is None
param_prefix = blob_out if is_not_sharing else share_with
scale = core.ScopedBlobReference(
param_prefix + '_s', self.param_init_net)
bias = core.ScopedBlobReference(
param_prefix + '_b', self.param_init_net)
if is_not_sharing:
self.net.Proto().external_input.extend([str(scale), str(bias)])
self.params.extend([scale, bias])
self.weights.append(scale)
self.biases.append(bias)
if inplace:
return self.net.AffineChannel([blob_in, scale, bias], blob_in)
else:
return self.net.AffineChannel([blob_in, scale, bias], blob_out)
def GenerateProposals(self, blobs_in, blobs_out, anchors, spatial_scale):
"""Op for generating RPN porposals.
blobs_in:
- 'rpn_cls_probs': 4D tensor of shape (N, A, H, W), where N is the
number of minibatch images, A is the number of anchors per
locations, and (H, W) is the spatial size of the prediction grid.
Each value represents a "probability of object" rating in [0, 1].
- 'rpn_bbox_pred': 4D tensor of shape (N, 4 * A, H, W) of predicted
deltas for transformation anchor boxes into RPN proposals.
- 'im_info': 2D tensor of shape (N, 3) where the three columns encode
the input image's [height, width, scale]. Height and width are
for the input to the network, not the original image; scale is the
scale factor used to scale the original image to the network input
size.
blobs_out:
- 'rpn_rois': 2D tensor of shape (R, 5), for R RPN proposals where the
five columns encode [batch ind, x1, y1, x2, y2]. The boxes are
w.r.t. the network input, which is a *scaled* version of the
original image; these proposals must be scaled by 1 / scale (where
scale comes from im_info; see above) to transform it back to the
original input image coordinate system.
- 'rpn_roi_probs': 1D tensor of objectness probability scores
(extracted from rpn_cls_probs; see above).
"""
name = 'GenerateProposalsOp:' + ','.join([str(b) for b in blobs_in])
self.net.Python(
GenerateProposalsOp(anchors, spatial_scale, self.train).forward
)(blobs_in, blobs_out, name=name)
return blobs_out
def GenerateProposalLabels(self, blobs_in):
"""Op for generating training labels for RPN proposals. This is used
when training RPN jointly with Fast/Mask R-CNN (as in end-to-end
Faster R-CNN training).
blobs_in:
- 'rpn_rois': 2D tensor of RPN proposals output by GenerateProposals
- 'roidb': roidb entries that will be labeled
- 'im_info': See GenerateProposals doc.
blobs_out:
- (variable set of blobs): returns whatever blobs are required for
training the model. It does this by querying the data loader for
the list of blobs that are needed.
"""
name = 'GenerateProposalLabelsOp:' + ','.join(
[str(b) for b in blobs_in]
)
# The list of blobs is not known before run-time because it depends on
# the specific model being trained. Query the data loader to get the
# list of output blob names.
blobs_out = roi_data.fast_rcnn.get_fast_rcnn_blob_names(
is_training=self.train
)
blobs_out = [core.ScopedBlobReference(b) for b in blobs_out]
self.net.Python(GenerateProposalLabelsOp().forward)(
blobs_in, blobs_out, name=name
)
return blobs_out
def CollectAndDistributeFpnRpnProposals(self):
"""Merge RPN proposals generated at multiple FPN levels and then
distribute those proposals to their appropriate FPN levels. An anchor
at one FPN level may predict an RoI that will map to another level,
hence the need to redistribute the proposals.
This function assumes standard blob names for input and output blobs.
Input blobs: [rpn_rois_fpn<min>, ..., rpn_rois_fpn<max>,
rpn_roi_probs_fpn<min>, ..., rpn_roi_probs_fpn<max>]
- rpn_rois_fpn<i> are the RPN proposals for FPN level i; see rpn_rois
documentation from GenerateProposals.
- rpn_roi_probs_fpn<i> are the RPN objectness probabilities for FPN
level i; see rpn_roi_probs documentation from GenerateProposals.
If used during training, then the input blobs will also include:
[roidb, im_info] (see GenerateProposalLabels).
Output blobs: [rois_fpn<min>, ..., rois_rpn<max>, rois,
rois_idx_restore]
- rois_fpn<i> are the RPN proposals for FPN level i
- rois_idx_restore is a permutation on the concatenation of all
rois_fpn<i>, i=min...max, such that when applied the RPN RoIs are
restored to their original order in the input blobs.
If used during training, then the output blobs will also include:
[labels, bbox_targets, bbox_inside_weights, bbox_outside_weights].
"""
k_max = cfg.FPN.RPN_MAX_LEVEL
k_min = cfg.FPN.RPN_MIN_LEVEL
# Prepare input blobs
rois_names = ['rpn_rois_fpn' + str(l) for l in range(k_min, k_max + 1)]
score_names = [
'rpn_roi_probs_fpn' + str(l) for l in range(k_min, k_max + 1)
]
blobs_in = rois_names + score_names
if self.train:
blobs_in += ['roidb', 'im_info']
blobs_in = [core.ScopedBlobReference(b) for b in blobs_in]
name = 'CollectAndDistributeFpnRpnProposalsOp:' + ','.join(
[str(b) for b in blobs_in]
)
# Prepare output blobs
blobs_out = roi_data.fast_rcnn.get_fast_rcnn_blob_names(
is_training=self.train
)
blobs_out = [core.ScopedBlobReference(b) for b in blobs_out]
outputs = self.net.Python(
CollectAndDistributeFpnRpnProposalsOp(self.train).forward
)(blobs_in, blobs_out, name=name)
return outputs
def DropoutIfTraining(self, blob_in, dropout_rate):
"""Add dropout to blob_in if the model is in training mode and
dropout_rate is > 0."""
blob_out = blob_in
if self.train and dropout_rate > 0:
blob_out = self.Dropout(
blob_in, blob_in, ratio=dropout_rate, is_test=False
)
return blob_out
def RoIFeatureTransform(
self,
blobs_in,
blob_out,
blob_rois='rois',
method='RoIPoolF',
resolution=7,
spatial_scale=1. / 16.,
sampling_ratio=0
):
"""Add the specified RoI pooling method. The sampling_ratio argument
is supported for some, but not all, RoI transform methods.
RoIFeatureTransform abstracts away:
- Use of FPN or not
- Specifics of the transform method
"""
assert method in {'RoIPoolF', 'RoIAlign'}, \
'Unknown pooling method: {}'.format(method)
has_argmax = (method == 'RoIPoolF')
if isinstance(blobs_in, list):
# FPN case: add RoIFeatureTransform to each FPN level
k_max = cfg.FPN.ROI_MAX_LEVEL # coarsest level of pyramid
k_min = cfg.FPN.ROI_MIN_LEVEL # finest level of pyramid
assert len(blobs_in) == k_max - k_min + 1
bl_out_list = []
for lvl in range(k_min, k_max + 1):
bl_in = blobs_in[k_max - lvl] # blobs_in is in reversed order
sc = spatial_scale[k_max - lvl] # in reversed order
bl_rois = blob_rois + '_fpn' + str(lvl)
bl_out = blob_out + '_fpn' + str(lvl)
bl_out_list.append(bl_out)
bl_argmax = ['_argmax_' + bl_out] if has_argmax else []
self.net.__getattr__(method)(
[bl_in, bl_rois], [bl_out] + bl_argmax,
pooled_w=resolution,
pooled_h=resolution,
spatial_scale=sc,
sampling_ratio=sampling_ratio
)
# The pooled features from all levels are concatenated along the
# batch dimension into a single 4D tensor.
xform_shuffled, _ = self.net.Concat(
bl_out_list, [blob_out + '_shuffled', '_concat_' + blob_out],
axis=0
)
# Unshuffle to match rois from dataloader
restore_bl = blob_rois + '_idx_restore_int32'
xform_out = self.net.BatchPermutation(
[xform_shuffled, restore_bl], blob_out
)
else:
# Single feature level
bl_argmax = ['_argmax_' + blob_out] if has_argmax else []
# sampling_ratio is ignored for RoIPoolF
xform_out = self.net.__getattr__(method)(
[blobs_in, blob_rois], [blob_out] + bl_argmax,
pooled_w=resolution,
pooled_h=resolution,
spatial_scale=spatial_scale,
sampling_ratio=sampling_ratio
)
# Only return the first blob (the transformed features)
return xform_out
def ConvShared(
self,
blob_in,
blob_out,
dim_in,
dim_out,
kernel,
weight=None,
bias=None,
**kwargs
):
"""Add conv op that shares weights and/or biases with another conv op.
"""
use_bias = (
False if ('no_bias' in kwargs and kwargs['no_bias']) else True
)
if self.use_cudnn:
kwargs['engine'] = 'CUDNN'
kwargs['exhaustive_search'] = self.cudnn_exhaustive_search
if self.ws_nbytes_limit:
kwargs['ws_nbytes_limit'] = self.ws_nbytes_limit
if use_bias:
blobs_in = [blob_in, weight, bias]
else:
blobs_in = [blob_in, weight]
if 'no_bias' in kwargs:
del kwargs['no_bias']
return self.net.Conv(
blobs_in, blob_out, kernel=kernel, order=self.order, **kwargs
)
def BilinearInterpolation(
self, blob_in, blob_out, dim_in, dim_out, up_scale
):
"""Bilinear interpolation in space of scale.
Takes input of NxKxHxW and outputs NxKx(sH)x(sW), where s:= up_scale
Adapted from the CVPR'15 FCN code.
See: https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/surgery.py
"""
assert dim_in == dim_out
assert up_scale % 2 == 0, 'Scale should be even'
def upsample_filt(size):
factor = (size + 1) // 2
if size % 2 == 1:
center = factor - 1
else:
center = factor - 0.5
og = np.ogrid[:size, :size]
return ((1 - abs(og[0] - center) / factor) *
(1 - abs(og[1] - center) / factor))
kernel_size = up_scale * 2
bil_filt = upsample_filt(kernel_size)
kernel = np.zeros(
(dim_in, dim_out, kernel_size, kernel_size), dtype=np.float32
)
kernel[range(dim_out), range(dim_in), :, :] = bil_filt
blob = self.ConvTranspose(
blob_in,
blob_out,
dim_in,
dim_out,
kernel_size,
stride=int(up_scale),
pad=int(up_scale / 2),
weight_init=('GivenTensorFill', {'values': kernel}),
bias_init=('ConstantFill', {'value': 0.})
)
self.do_not_update_params.append(self.weights[-1])
self.do_not_update_params.append(self.biases[-1])
return blob
def ConvAffine( # args in the same order of Conv()
self, blob_in, prefix, dim_in, dim_out, kernel, stride, pad,
group=1, dilation=1,
weight_init=None,
bias_init=None,
suffix='_bn',
inplace=False
):
"""ConvAffine adds a Conv op followed by a AffineChannel op (which
replaces BN during fine tuning).
"""
conv_blob = self.Conv(
blob_in,
prefix,
dim_in,
dim_out,
kernel,
stride=stride,
pad=pad,
group=group,
dilation=dilation,
weight_init=weight_init,
bias_init=bias_init,
no_bias=1
)
blob_out = self.AffineChannel(
conv_blob, prefix + suffix, inplace=inplace
)
return blob_out
def DisableCudnn(self):
self.prev_use_cudnn = self.use_cudnn
self.use_cudnn = False
def RestorePreviousUseCudnn(self):
prev_use_cudnn = self.use_cudnn
self.use_cudnn = self.prev_use_cudnn
self.prev_use_cudnn = prev_use_cudnn
def UpdateWorkspaceLr(self, cur_iter):
"""Updates the model's current learning rate and the workspace (learning
rate and update history/momentum blobs).
"""
# The workspace is the one source of truth for the lr
# The lr is always the same on all GPUs
cur_lr = workspace.FetchBlob('gpu_0/lr')[0]
new_lr = lr_policy.get_lr_at_iter(cur_iter)
# There are no type conversions between the lr in Python and the lr in
# the GPU (both are float32), so exact comparision is ok
if cur_lr != new_lr:
ratio = _get_lr_change_ratio(cur_lr, new_lr)
if ratio > cfg.SOLVER.LOG_LR_CHANGE_THRESHOLD:
logger.info(
'Changing learning rate {:.6f} -> {:.6f} at iter {:d}'.
format(cur_lr, new_lr, cur_iter))
self._SetNewLr(cur_lr, new_lr)
return new_lr
def _SetNewLr(self, cur_lr, new_lr):
"""Do the actual work of updating the model and workspace blobs.
"""
for i in range(cfg.NUM_GPUS):
with c2_utils.CudaScope(i):
workspace.FeedBlob(
'gpu_{}/lr'.format(i), np.array([new_lr], dtype=np.float32))
ratio = _get_lr_change_ratio(cur_lr, new_lr)
if cfg.SOLVER.SCALE_MOMENTUM and cur_lr > 1e-7 and \
ratio > cfg.SOLVER.SCALE_MOMENTUM_THRESHOLD:
self._CorrectMomentum(new_lr / cur_lr)
def _CorrectMomentum(self, correction):
"""The MomentumSGDUpdate op implements the update V as
V := mu * V + lr * grad,
where mu is the momentum factor, lr is the learning rate, and grad is
the stochastic gradient. Since V is not defined independently of the
learning rate (as it should ideally be), when the learning rate is
changed we should scale the update history V in order to make it
compatible in scale with lr * grad.
"""
logger.info(
'Scaling update history by {:.6f} (new lr / old lr)'.
format(correction))
for i in range(cfg.NUM_GPUS):
with c2_utils.CudaScope(i):
for param in self.TrainableParams(gpu_id=i):
op = core.CreateOperator(
'Scale', [param + '_momentum'], [param + '_momentum'],
scale=correction)
workspace.RunOperatorOnce(op)
def AddLosses(self, losses):
if not isinstance(losses, list):
losses = [losses]
# Conversion to str allows losses to include BlobReferences
losses = [c2_utils.UnscopeName(str(l)) for l in losses]
self.losses = list(set(self.losses + losses))
def AddMetrics(self, metrics):
if not isinstance(metrics, list):
metrics = [metrics]
self.metrics = list(set(self.metrics + metrics))
def _get_lr_change_ratio(cur_lr, new_lr):
eps = 1e-10
ratio = np.max(
(new_lr / np.max((cur_lr, eps)), cur_lr / np.max((new_lr, eps)))
)
return ratio
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Various network "heads" for classification and bounding box prediction.
The design is as follows:
... -> RoI ----\ /-> box cls output -> cls loss
-> RoIFeatureXform -> box head
... -> Feature / \-> box reg output -> reg loss
Map
The Fast R-CNN head produces a feature representation of the RoI for the purpose
of bounding box classification and regression. The box output module converts
the feature representation into classification and regression predictions.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from core.config import cfg
from utils.c2 import const_fill
from utils.c2 import gauss_fill
import utils.blob as blob_utils
# ---------------------------------------------------------------------------- #
# Fast R-CNN outputs and losses
# ---------------------------------------------------------------------------- #
def add_fast_rcnn_outputs(model, blob_in, dim):
"""Add RoI classification and bounding box regression output ops."""
model.FC(
blob_in,
'cls_score',
dim,
model.num_classes,
weight_init=gauss_fill(0.01),
bias_init=const_fill(0.0)
)
if not model.train: # == if test
# Only add softmax when testing; during training the softmax is combined
# with the label cross entropy loss for numerical stability
model.Softmax('cls_score', 'cls_prob', engine='CUDNN')
model.FC(
blob_in,
'bbox_pred',
dim,
model.num_classes * 4,
weight_init=gauss_fill(0.001),
bias_init=const_fill(0.0)
)
def add_fast_rcnn_losses(model):
"""Add losses for RoI classification and bounding box regression."""
cls_prob, loss_cls = model.net.SoftmaxWithLoss(
['cls_score', 'labels_int32'], ['cls_prob', 'loss_cls'],
scale=1. / cfg.NUM_GPUS
)
loss_bbox = model.net.SmoothL1Loss(
[
'bbox_pred', 'bbox_targets', 'bbox_inside_weights',
'bbox_outside_weights'
],
'loss_bbox',
scale=1. / cfg.NUM_GPUS
)
loss_gradients = blob_utils.get_loss_gradients(model, [loss_cls, loss_bbox])
model.Accuracy(['cls_prob', 'labels_int32'], 'accuracy_cls')
model.AddLosses(['loss_cls', 'loss_bbox'])
model.AddMetrics('accuracy_cls')
return loss_gradients
# ---------------------------------------------------------------------------- #
# Box heads
# ---------------------------------------------------------------------------- #
def add_roi_2mlp_head(model, blob_in, dim_in, spatial_scale):
"""Add a ReLU MLP with two hidden layers."""
hidden_dim = cfg.FAST_RCNN.MLP_HEAD_DIM
roi_size = cfg.FAST_RCNN.ROI_XFORM_RESOLUTION
roi_feat = model.RoIFeatureTransform(
blob_in,
'roi_feat',
blob_rois='rois',
method=cfg.FAST_RCNN.ROI_XFORM_METHOD,
resolution=roi_size,
sampling_ratio=cfg.FAST_RCNN.ROI_XFORM_SAMPLING_RATIO,
spatial_scale=spatial_scale
)
model.FC(roi_feat, 'fc6', dim_in * roi_size * roi_size, hidden_dim)
model.Relu('fc6', 'fc6')
model.FC('fc6', 'fc7', hidden_dim, hidden_dim)
model.Relu('fc7', 'fc7')
return 'fc7', hidden_dim
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
#
# Based on:
# --------------------------------------------------------
# Faster R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick and Sean Bell
# --------------------------------------------------------
import numpy as np
# Verify that we compute the same anchors as Shaoqing's matlab implementation:
#
# >> load output/rpn_cachedir/faster_rcnn_VOC2007_ZF_stage1_rpn/anchors.mat
# >> anchors
#
# anchors =
#
# -83 -39 100 56
# -175 -87 192 104
# -359 -183 376 200
# -55 -55 72 72
# -119 -119 136 136
# -247 -247 264 264
# -35 -79 52 96
# -79 -167 96 184
# -167 -343 184 360
# array([[ -83., -39., 100., 56.],
# [-175., -87., 192., 104.],
# [-359., -183., 376., 200.],
# [ -55., -55., 72., 72.],
# [-119., -119., 136., 136.],
# [-247., -247., 264., 264.],
# [ -35., -79., 52., 96.],
# [ -79., -167., 96., 184.],
# [-167., -343., 184., 360.]])
def generate_anchors(
stride=16, sizes=(32, 64, 128, 256, 512), aspect_ratios=(0.5, 1, 2)
):
"""Generates a matrix of anchor boxes in (x1, y1, x2, y2) format. Anchors
are centered on stride / 2, have (approximate) sqrt areas of the specified
sizes, and aspect ratios as given.
"""
return _generate_anchors(
stride,
np.array(sizes, dtype=np.float) / stride,
np.array(aspect_ratios, dtype=np.float)
)
def _generate_anchors(base_size, scales, aspect_ratios):
"""Generate anchor (reference) windows by enumerating aspect ratios X
scales wrt a reference (0, 0, base_size - 1, base_size - 1) window.
"""
anchor = np.array([1, 1, base_size, base_size], dtype=np.float) - 1
anchors = _ratio_enum(anchor, aspect_ratios)
anchors = np.vstack(
[_scale_enum(anchors[i, :], scales) for i in range(anchors.shape[0])]
)
return anchors
def _whctrs(anchor):
"""Return width, height, x center, and y center for an anchor (window)."""
w = anchor[2] - anchor[0] + 1
h = anchor[3] - anchor[1] + 1
x_ctr = anchor[0] + 0.5 * (w - 1)
y_ctr = anchor[1] + 0.5 * (h - 1)
return w, h, x_ctr, y_ctr
def _mkanchors(ws, hs, x_ctr, y_ctr):
"""Given a vector of widths (ws) and heights (hs) around a center
(x_ctr, y_ctr), output a set of anchors (windows).
"""
ws = ws[:, np.newaxis]
hs = hs[:, np.newaxis]
anchors = np.hstack(
(
x_ctr - 0.5 * (ws - 1),
y_ctr - 0.5 * (hs - 1),
x_ctr + 0.5 * (ws - 1),
y_ctr + 0.5 * (hs - 1)
)
)
return anchors
def _ratio_enum(anchor, ratios):
"""Enumerate a set of anchors for each aspect ratio wrt an anchor."""
w, h, x_ctr, y_ctr = _whctrs(anchor)
size = w * h
size_ratios = size / ratios
ws = np.round(np.sqrt(size_ratios))
hs = np.round(ws * ratios)
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
def _scale_enum(anchor, scales):
"""Enumerate a set of anchors for each scale wrt an anchor."""
w, h, x_ctr, y_ctr = _whctrs(anchor)
ws = w * scales
hs = h * scales
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Various network "heads" for predicting keypoints in Mask R-CNN.
The design is as follows:
... -> RoI ----\
-> RoIFeatureXform -> keypoint head -> keypoint output -> loss
... -> Feature /
Map
The keypoint head produces a feature representation of the RoI for the purpose
of keypoint prediction. The keypoint output module converts the feature
representation into keypoint heatmaps.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from core.config import cfg
from utils.c2 import const_fill
from utils.c2 import gauss_fill
import modeling.ResNet as ResNet
import utils.blob as blob_utils
# ---------------------------------------------------------------------------- #
# Keypoint R-CNN outputs and losses
# ---------------------------------------------------------------------------- #
def add_keypoint_outputs(model, blob_in, dim):
"""Add Mask R-CNN keypoint specific outputs: keypoint heatmaps."""
# NxKxHxW
upsample_heatmap = (cfg.KRCNN.UP_SCALE > 1)
if cfg.KRCNN.USE_DECONV:
# Apply ConvTranspose to the feature representation; results in 2x
# upsampling
blob_in = model.ConvTranspose(
blob_in,
'kps_deconv',
dim,
cfg.KRCNN.DECONV_DIM,
kernel=cfg.KRCNN.DECONV_KERNEL,
pad=int(cfg.KRCNN.DECONV_KERNEL / 2 - 1),
stride=2,
weight_init=gauss_fill(0.01),
bias_init=const_fill(0.0)
)
model.Relu('kps_deconv', 'kps_deconv')
dim = cfg.KRCNN.DECONV_DIM
if upsample_heatmap:
blob_name = 'kps_score_lowres'
else:
blob_name = 'kps_score'
if cfg.KRCNN.USE_DECONV_OUTPUT:
# Use ConvTranspose to predict heatmaps; results in 2x upsampling
blob_out = model.ConvTranspose(
blob_in,
blob_name,
dim,
cfg.KRCNN.NUM_KEYPOINTS,
kernel=cfg.KRCNN.DECONV_KERNEL,
pad=int(cfg.KRCNN.DECONV_KERNEL / 2 - 1),
stride=2,
weight_init=(cfg.KRCNN.CONV_INIT, {'std': 0.001}),
bias_init=const_fill(0.0)
)
else:
# Use Conv to predict heatmaps; does no upsampling
blob_out = model.Conv(
blob_in,
blob_name,
dim,
cfg.KRCNN.NUM_KEYPOINTS,
kernel=1,
pad=0,
stride=1,
weight_init=(cfg.KRCNN.CONV_INIT, {'std': 0.001}),
bias_init=const_fill(0.0)
)
if upsample_heatmap:
# Increase heatmap output size via bilinear upsampling
blob_out = model.BilinearInterpolation(
blob_out, 'kps_score', cfg.KRCNN.NUM_KEYPOINTS,
cfg.KRCNN.NUM_KEYPOINTS, cfg.KRCNN.UP_SCALE
)
return blob_out
def add_keypoint_losses(model):
"""Add Mask R-CNN keypoint specific losses."""
# Reshape input from (N, K, H, W) to (NK, HW)
model.net.Reshape(
['kps_score'], ['kps_score_reshaped', '_kps_score_old_shape'],
shape=(-1, cfg.KRCNN.HEATMAP_SIZE * cfg.KRCNN.HEATMAP_SIZE)
)
# Softmax across **space** (woahh....space!)
# Note: this is not what is commonly called "spatial softmax"
# (i.e., softmax applied along the channel dimension at each spatial
# location); This is softmax applied over a set of spatial locations (i.e.,
# each spatial location is a "class").
kps_prob, loss_kps = model.net.SoftmaxWithLoss(
['kps_score_reshaped', 'keypoint_locations_int32', 'keypoint_weights'],
['kps_prob', 'loss_kps'],
scale=cfg.KRCNN.LOSS_WEIGHT / cfg.NUM_GPUS,
spatial=0
)
if not cfg.KRCNN.NORMALIZE_BY_VISIBLE_KEYPOINTS:
# Discussion: the softmax loss above will average the loss by the sum of
# keypoint_weights, i.e. the total number of visible keypoints. Since
# the number of visible keypoints can vary significantly between
# minibatches, this has the effect of up-weighting the importance of
# minibatches with few visible keypoints. (Imagine the extreme case of
# only one visible keypoint versus N: in the case of N, each one
# contributes 1/N to the gradient compared to the single keypoint
# determining the gradient direction). Instead, we can normalize the
# loss by the total number of keypoints, if it were the case that all
# keypoints were visible in a full minibatch. (Returning to the example,
# this means that the one visible keypoint contributes as much as each
# of the N keypoints.)
model.StopGradient(
'keypoint_loss_normalizer', 'keypoint_loss_normalizer'
)
loss_kps = model.net.Mul(
['loss_kps', 'keypoint_loss_normalizer'], 'loss_kps_normalized'
)
loss_gradients = blob_utils.get_loss_gradients(model, [loss_kps])
model.AddLosses(loss_kps)
return loss_gradients
# ---------------------------------------------------------------------------- #
# Keypoint heads
# ---------------------------------------------------------------------------- #
def add_ResNet_roi_conv5_head_for_keypoints(
model, blob_in, dim_in, spatial_scale
):
"""Add a ResNet "conv5" / "stage5" head for Mask R-CNN keypoint prediction.
"""
model.RoIFeatureTransform(
blob_in,
'_[pose]_pool5',
blob_rois='keypoint_rois',
method=cfg.KRCNN.ROI_XFORM_METHOD,
resolution=cfg.KRCNN.ROI_XFORM_RESOLUTION,
sampling_ratio=cfg.KRCNN.ROI_XFORM_SAMPLING_RATIO,
spatial_scale=spatial_scale
)
# Using the prefix '_[pose]_' to 'res5' enables initializing the head's
# parameters using pretrained 'res5' parameters if given (see
# utils.net.initialize_gpu_0_from_weights_file)
s, dim_in = ResNet.add_stage(
model,
'_[pose]_res5',
'_[pose]_pool5',
3,
dim_in,
2048,
512,
cfg.KRCNN.DILATION,
stride_init=int(cfg.KRCNN.ROI_XFORM_RESOLUTION / 7)
)
return s, 2048
def add_roi_pose_head_v1convX(model, blob_in, dim_in, spatial_scale):
"""Add a Mask R-CNN keypoint head. v1convX design: X * (conv)."""
hidden_dim = cfg.KRCNN.CONV_HEAD_DIM
kernel_size = cfg.KRCNN.CONV_HEAD_KERNEL
pad_size = kernel_size // 2
current = model.RoIFeatureTransform(
blob_in,
'_[pose]_roi_feat',
blob_rois='keypoint_rois',
method=cfg.KRCNN.ROI_XFORM_METHOD,
resolution=cfg.KRCNN.ROI_XFORM_RESOLUTION,
sampling_ratio=cfg.KRCNN.ROI_XFORM_SAMPLING_RATIO,
spatial_scale=spatial_scale
)
for i in range(cfg.KRCNN.NUM_STACKED_CONVS):
current = model.Conv(
current,
'conv_fcn' + str(i + 1),
dim_in,
hidden_dim,
kernel_size,
stride=1,
pad=pad_size,
weight_init=(cfg.KRCNN.CONV_INIT, {'std': 0.01}),
bias_init=('ConstantFill', {'value': 0.})
)
current = model.Relu(current, current)
dim_in = hidden_dim
return current, hidden_dim
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Various network "heads" for predicting masks in Mask R-CNN.
The design is as follows:
... -> RoI ----\
-> RoIFeatureXform -> mask head -> mask output -> loss
... -> Feature /
Map
The mask head produces a feature representation of the RoI for the purpose
of mask prediction. The mask output module converts the feature representation
into real-valued (soft) masks.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from core.config import cfg
from utils.c2 import const_fill
from utils.c2 import gauss_fill
import modeling.ResNet as ResNet
import utils.blob as blob_utils
# ---------------------------------------------------------------------------- #
# Mask R-CNN outputs and losses
# ---------------------------------------------------------------------------- #
def add_mask_rcnn_outputs(model, blob_in, dim):
"""Add Mask R-CNN specific outputs: either mask logits or probs."""
num_cls = cfg.MODEL.NUM_CLASSES if cfg.MRCNN.CLS_SPECIFIC_MASK else 1
if cfg.MRCNN.USE_FC_OUTPUT:
# Predict masks with a fully connected layer (ignore 'fcn' in the blob
# name)
blob_out = model.FC(
blob_in,
'mask_fcn_logits',
dim,
num_cls * cfg.MRCNN.RESOLUTION**2,
weight_init=gauss_fill(0.001),
bias_init=const_fill(0.0)
)
else:
# Predict mask using Conv
# Use GaussianFill for class-agnostic mask prediction; fills based on
# fan-in can be too large in this case and cause divergence
fill = (
cfg.MRCNN.CONV_INIT
if cfg.MRCNN.CLS_SPECIFIC_MASK else 'GaussianFill'
)
blob_out = model.Conv(
blob_in,
'mask_fcn_logits',
dim,
num_cls,
kernel=1,
pad=0,
stride=1,
weight_init=(fill, {'std': 0.001}),
bias_init=const_fill(0.0)
)
if cfg.MRCNN.UPSAMPLE_RATIO > 1:
blob_out = model.BilinearInterpolation(
'mask_fcn_logits', 'mask_fcn_logits_up', num_cls, num_cls,
cfg.MRCNN.UPSAMPLE_RATIO
)
if not model.train: # == if test
blob_out = model.net.Sigmoid(blob_out, 'mask_fcn_probs')
return blob_out
def add_mask_rcnn_losses(model, blob_mask):
"""Add Mask R-CNN specific losses."""
loss_mask = model.net.SigmoidCrossEntropyLoss(
[blob_mask, 'masks_int32'],
'loss_mask',
scale=1. / cfg.NUM_GPUS * cfg.MRCNN.WEIGHT_LOSS_MASK
)
loss_gradients = blob_utils.get_loss_gradients(model, [loss_mask])
model.AddLosses('loss_mask')
return loss_gradients
# ---------------------------------------------------------------------------- #
# Mask heads
# ---------------------------------------------------------------------------- #
def mask_rcnn_fcn_head_v1up4convs(model, blob_in, dim_in, spatial_scale):
"""v1up design: 4 * (conv 3x3), convT 2x2."""
return mask_rcnn_fcn_head_v1upXconvs(
model, blob_in, dim_in, spatial_scale, 4
)
def mask_rcnn_fcn_head_v1up(model, blob_in, dim_in, spatial_scale):
"""v1up design: 2 * (conv 3x3), convT 2x2."""
return mask_rcnn_fcn_head_v1upXconvs(
model, blob_in, dim_in, spatial_scale, 2
)
def mask_rcnn_fcn_head_v1upXconvs(
model, blob_in, dim_in, spatial_scale, num_convs
):
"""v1upXconvs design: X * (conv 3x3), convT 2x2."""
current = model.RoIFeatureTransform(
blob_in,
blob_out='_[mask]_roi_feat',
blob_rois='mask_rois',
method=cfg.MRCNN.ROI_XFORM_METHOD,
resolution=cfg.MRCNN.ROI_XFORM_RESOLUTION,
sampling_ratio=cfg.MRCNN.ROI_XFORM_SAMPLING_RATIO,
spatial_scale=spatial_scale
)
dilation = cfg.MRCNN.DILATION
dim_inner = cfg.MRCNN.DIM_REDUCED
for i in range(num_convs):
current = model.Conv(
current,
'_[mask]_fcn' + str(i + 1),
dim_in,
dim_inner,
kernel=3,
pad=1 * dilation,
stride=1,
weight_init=(cfg.MRCNN.CONV_INIT, {'std': 0.001}),
bias_init=('ConstantFill', {'value': 0.})
)
current = model.Relu(current, current)
dim_in = dim_inner
# upsample layer
model.ConvTranspose(
current,
'conv5_mask',
dim_inner,
dim_inner,
kernel=2,
pad=0,
stride=2,
weight_init=(cfg.MRCNN.CONV_INIT, {'std': 0.001}),
bias_init=const_fill(0.0)
)
blob_mask = model.Relu('conv5_mask', 'conv5_mask')
return blob_mask, dim_inner
def mask_rcnn_fcn_head_v0upshare(model, blob_in, dim_in, spatial_scale):
"""Use a ResNet "conv5" / "stage5" head for mask prediction. Weights and
computation are shared with the conv5 box head. Computation can only be
shared during training, since inference is cascaded.
v0upshare design: conv5, convT 2x2.
"""
# Since box and mask head are shared, these must match
assert cfg.MRCNN.ROI_XFORM_RESOLUTION == cfg.FAST_RCNN.ROI_XFORM_RESOLUTION
if model.train: # share computation with bbox head at training time
dim_conv5 = 2048
blob_conv5 = model.net.SampleAs(
['res5_2_sum', 'roi_has_mask_int32'],
['_[mask]_res5_2_sum_sliced']
)
else: # re-compute at test time
blob_conv5, dim_conv5 = add_ResNet_roi_conv5_head_for_masks(
model,
blob_in,
dim_in,
spatial_scale
)
dim_reduced = cfg.MRCNN.DIM_REDUCED
blob_mask = model.ConvTranspose(
blob_conv5,
'conv5_mask',
dim_conv5,
dim_reduced,
kernel=2,
pad=0,
stride=2,
weight_init=(cfg.MRCNN.CONV_INIT, {'std': 0.001}), # std only for gauss
bias_init=const_fill(0.0)
)
model.Relu('conv5_mask', 'conv5_mask')
return blob_mask, dim_reduced
def mask_rcnn_fcn_head_v0up(model, blob_in, dim_in, spatial_scale):
"""v0up design: conv5, deconv 2x2 (no weight sharing with the box head)."""
blob_conv5, dim_conv5 = add_ResNet_roi_conv5_head_for_masks(
model,
blob_in,
dim_in,
spatial_scale
)
dim_reduced = cfg.MRCNN.DIM_REDUCED
model.ConvTranspose(
blob_conv5,
'conv5_mask',
dim_conv5,
dim_reduced,
kernel=2,
pad=0,
stride=2,
weight_init=('GaussianFill', {'std': 0.001}),
bias_init=const_fill(0.0)
)
blob_mask = model.Relu('conv5_mask', 'conv5_mask')
return blob_mask, dim_reduced
def add_ResNet_roi_conv5_head_for_masks(model, blob_in, dim_in, spatial_scale):
"""Add a ResNet "conv5" / "stage5" head for predicting masks."""
model.RoIFeatureTransform(
blob_in,
blob_out='_[mask]_pool5',
blob_rois='mask_rois',
method=cfg.MRCNN.ROI_XFORM_METHOD,
resolution=cfg.MRCNN.ROI_XFORM_RESOLUTION,
sampling_ratio=cfg.MRCNN.ROI_XFORM_SAMPLING_RATIO,
spatial_scale=spatial_scale
)
dilation = cfg.MRCNN.DILATION
stride_init = int(cfg.MRCNN.ROI_XFORM_RESOLUTION / 7) # by default: 2
s, dim_in = ResNet.add_stage(
model,
'_[mask]_res5',
'_[mask]_pool5',
3,
dim_in,
2048,
512,
dilation,
stride_init=stride_init
)
return s, 2048
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Detectron model construction functions.
Detectron supports a large number of model types. The configuration space is
large. To get a sense, a given model is in element in the cartesian product of:
- backbone (e.g., VGG16, ResNet, ResNeXt)
- FPN (on or off)
- RPN only (just proposals)
- Fixed proposals for Fast R-CNN, RFCN, Mask R-CNN (with or without keypoints)
- End-to-end model with RPN + Fast R-CNN (i.e., Faster R-CNN), Mask R-CNN, ...
- Different "head" choices for the model
- ... many configuration options ...
A given model is made by combining many basic components. The result is flexible
though somewhat complex to understand at first.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import copy
import importlib
import logging
from caffe2.python import core
from caffe2.python import workspace
from core.config import cfg
from modeling.detector import DetectionModelHelper
from roi_data.loader import RoIDataLoader
import modeling.fast_rcnn_heads as fast_rcnn_heads
import modeling.keypoint_rcnn_heads as keypoint_rcnn_heads
import modeling.mask_rcnn_heads as mask_rcnn_heads
import modeling.name_compat
import modeling.optimizer as optim
import modeling.retinanet_heads as retinanet_heads
import modeling.rfcn_heads as rfcn_heads
import modeling.rpn_heads as rpn_heads
import roi_data.minibatch
import utils.c2 as c2_utils
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------- #
# Generic recomposable model builders
#
# For example, you can create a Fast R-CNN model with the ResNet-50-C4 backbone
# with the configuration:
#
# MODEL:
# TYPE: generalized_rcnn
# CONV_BODY: ResNet.add_ResNet50_conv4_body
# ROI_HEAD: ResNet.add_ResNet_roi_conv5_head
# ---------------------------------------------------------------------------- #
def generalized_rcnn(model):
"""This model type handles:
- Fast R-CNN
- RPN only (not integrated with Fast R-CNN)
- Faster R-CNN (stagewise training from NIPS paper)
- Faster R-CNN (end-to-end joint training)
- Mask R-CNN (stagewise training from NIPS paper)
- Mask R-CNN (end-to-end joint training)
"""
return build_generic_detection_model(
model,
get_func(cfg.MODEL.CONV_BODY),
add_roi_box_head_func=get_func(cfg.FAST_RCNN.ROI_BOX_HEAD),
add_roi_mask_head_func=get_func(cfg.MRCNN.ROI_MASK_HEAD),
add_roi_keypoint_head_func=get_func(cfg.KRCNN.ROI_KEYPOINTS_HEAD),
freeze_conv_body=cfg.TRAIN.FREEZE_CONV_BODY
)
def rfcn(model):
# TODO(rbg): fold into build_generic_detection_model
return build_generic_rfcn_model(model, get_func(cfg.MODEL.CONV_BODY))
def retinanet(model):
# TODO(rbg): fold into build_generic_detection_model
return build_generic_retinanet_model(model, get_func(cfg.MODEL.CONV_BODY))
# ---------------------------------------------------------------------------- #
# Helper functions for building various re-usable network bits
# ---------------------------------------------------------------------------- #
def create(model_type_func, train=False):
"""Generic model creation function that dispatches to specific model
building functions.
"""
model = DetectionModelHelper(
name=model_type_func,
train=train,
num_classes=cfg.MODEL.NUM_CLASSES,
init_params=train
)
return get_func(model_type_func)(model)
def get_func(func_name):
"""Helper to return a function object by name. func_name must identify a
function in this module or the path to a function relative to the base
'modeling' module.
"""
if func_name == '':
return None
new_func_name = modeling.name_compat.get_new_name(func_name)
if new_func_name != func_name:
logger.warn(
'Remapping old function name: {} -> {}'.
format(func_name, new_func_name)
)
func_name = new_func_name
try:
parts = func_name.split('.')
# Refers to a function in this module
if len(parts) == 1:
return globals()[parts[0]]
# Otherwise, assume we're referencing a module under modeling
module_name = 'modeling.' + '.'.join(parts[:-1])
module = importlib.import_module(module_name)
return getattr(module, parts[-1])
except Exception:
logger.error('Failed to find function: {}'.format(func_name))
raise
def build_generic_detection_model(
model,
add_conv_body_func,
add_roi_box_head_func=None,
add_roi_mask_head_func=None,
add_roi_keypoint_head_func=None,
freeze_conv_body=False
):
def _single_gpu_build_func(model):
"""Build the model on a single GPU. Can be called in a loop over GPUs
with name and device scoping to create a data parallel model.
"""
# Add the conv body (called "backbone architecture" in papers)
# E.g., ResNet-50, ResNet-50-FPN, ResNeXt-101-FPN, etc.
blob_conv, dim_conv, spatial_scale_conv = add_conv_body_func(model)
if freeze_conv_body:
for b in c2_utils.BlobReferenceList(blob_conv):
model.StopGradient(b, b)
if not model.train: # == inference
# Create a net that can be used to execute the conv body on an image
# (without also executing RPN or any other network heads)
model.conv_body_net = model.net.Clone('conv_body_net')
head_loss_gradients = {
'rpn': None,
'box': None,
'mask': None,
'keypoints': None,
}
if cfg.RPN.RPN_ON:
# Add the RPN head
head_loss_gradients['rpn'] = rpn_heads.add_generic_rpn_outputs(
model, blob_conv, dim_conv, spatial_scale_conv
)
if cfg.FPN.FPN_ON:
# After adding the RPN head, restrict FPN blobs and scales to
# those used in the RoI heads
blob_conv, spatial_scale_conv = _narrow_to_fpn_roi_levels(
blob_conv, spatial_scale_conv
)
if not cfg.MODEL.RPN_ONLY:
# Add the Fast R-CNN head
head_loss_gradients['box'] = _add_fast_rcnn_head(
model, add_roi_box_head_func, blob_conv, dim_conv,
spatial_scale_conv
)
if cfg.MODEL.MASK_ON:
# Add the mask head
head_loss_gradients['mask'] = _add_roi_mask_head(
model, add_roi_mask_head_func, blob_conv, dim_conv,
spatial_scale_conv
)
if cfg.MODEL.KEYPOINTS_ON:
# Add the keypoint head
head_loss_gradients['keypoint'] = _add_roi_keypoint_head(
model, add_roi_keypoint_head_func, blob_conv, dim_conv,
spatial_scale_conv
)
if model.train:
loss_gradients = {}
for lg in head_loss_gradients.values():
if lg is not None:
loss_gradients.update(lg)
return loss_gradients
else:
return None
optim.build_data_parallel_model(model, _single_gpu_build_func)
return model
def _narrow_to_fpn_roi_levels(blobs, spatial_scales):
"""Return only the blobs and spatial scales that will be used for RoI heads.
Inputs `blobs` and `spatial_scales` may include extra blobs and scales that
are used for RPN proposals, but not for RoI heads.
"""
# Code only supports case when RPN and ROI min levels are the same
assert cfg.FPN.RPN_MIN_LEVEL == cfg.FPN.ROI_MIN_LEVEL
# RPN max level can be >= to ROI max level
assert cfg.FPN.RPN_MAX_LEVEL >= cfg.FPN.ROI_MAX_LEVEL
# FPN RPN max level might be > FPN ROI max level in which case we
# need to discard some leading conv blobs (blobs are ordered from
# max/coarsest level to min/finest level)
num_roi_levels = cfg.FPN.ROI_MAX_LEVEL - cfg.FPN.ROI_MIN_LEVEL + 1
return blobs[-num_roi_levels:], spatial_scales[-num_roi_levels:]
def _add_fast_rcnn_head(
model, add_roi_box_head_func, blob_in, dim_in, spatial_scale_in
):
"""Add a Fast R-CNN head to the model."""
blob_frcn, dim_frcn = add_roi_box_head_func(
model, blob_in, dim_in, spatial_scale_in
)
fast_rcnn_heads.add_fast_rcnn_outputs(model, blob_frcn, dim_frcn)
if model.train:
loss_gradients = fast_rcnn_heads.add_fast_rcnn_losses(model)
else:
loss_gradients = None
return loss_gradients
def _add_roi_mask_head(
model, add_roi_mask_head_func, blob_in, dim_in, spatial_scale_in
):
"""Add a mask prediction head to the model."""
# Capture model graph before adding the mask head
bbox_net = copy.deepcopy(model.net.Proto())
# Add the mask head
blob_mask_head, dim_mask_head = add_roi_mask_head_func(
model, blob_in, dim_in, spatial_scale_in
)
# Add the mask output
blob_mask = mask_rcnn_heads.add_mask_rcnn_outputs(
model, blob_mask_head, dim_mask_head
)
if not model.train: # == inference
# Inference uses a cascade of box predictions, then mask predictions.
# This requires separate nets for box and mask prediction.
# So we extract the mask prediction net, store it as its own network,
# then restore model.net to be the bbox-only network
model.mask_net, blob_mask = c2_utils.SuffixNet(
'mask_net', model.net, len(bbox_net.op), blob_mask
)
model.net._net = bbox_net
loss_gradients = None
else:
loss_gradients = mask_rcnn_heads.add_mask_rcnn_losses(model, blob_mask)
return loss_gradients
def _add_roi_keypoint_head(
model, add_roi_keypoint_head_func, blob_in, dim_in, spatial_scale_in
):
"""Add a keypoint prediction head to the model."""
# Capture model graph before adding the mask head
bbox_net = copy.deepcopy(model.net.Proto())
# Add the keypoint head
blob_keypoint_head, dim_keypoint_head = add_roi_keypoint_head_func(
model, blob_in, dim_in, spatial_scale_in
)
# Add the keypoint output
blob_keypoint = keypoint_rcnn_heads.add_keypoint_outputs(
model, blob_keypoint_head, dim_keypoint_head
)
if not model.train: # == inference
# Inference uses a cascade of box predictions, then keypoint predictions
# This requires separate nets for box and keypoint prediction.
# So we extract the keypoint prediction net, store it as its own
# network, then restore model.net to be the bbox-only network
model.keypoint_net, keypoint_blob_out = c2_utils.SuffixNet(
'keypoint_net', model.net, len(bbox_net.op), blob_keypoint
)
model.net._net = bbox_net
loss_gradients = None
else:
loss_gradients = keypoint_rcnn_heads.add_keypoint_losses(model)
return loss_gradients
def build_generic_rfcn_model(model, add_conv_body_func, dim_reduce=None):
# TODO(rbg): fold this function into build_generic_detection_model
def _single_gpu_build_func(model):
"""Builds the model on a single GPU. Can be called in a loop over GPUs
with name and device scoping to create a data parallel model."""
blob, dim, spatial_scale = add_conv_body_func(model)
if not model.train:
model.conv_body_net = model.net.Clone('conv_body_net')
rfcn_heads.add_rfcn_outputs(model, blob, dim, dim_reduce, spatial_scale)
if model.train:
loss_gradients = fast_rcnn_heads.add_fast_rcnn_losses(model)
return loss_gradients if model.train else None
optim.build_data_parallel_model(model, _single_gpu_build_func)
return model
def build_generic_retinanet_model(
model, add_conv_body_func, freeze_conv_body=False
):
# TODO(rbg): fold this function into build_generic_detection_model
def _single_gpu_build_func(model):
"""Builds the model on a single GPU. Can be called in a loop over GPUs
with name and device scoping to create a data parallel model."""
blobs, dim, spatial_scales = add_conv_body_func(model)
retinanet_heads.add_fpn_retinanet_outputs(
model, blobs, dim, spatial_scales
)
if model.train:
loss_gradients = retinanet_heads.add_fpn_retinanet_losses(
model
)
return loss_gradients if model.train else None
optim.build_data_parallel_model(model, _single_gpu_build_func)
return model
# ---------------------------------------------------------------------------- #
# Network inputs
# ---------------------------------------------------------------------------- #
def add_training_inputs(model, roidb=None):
"""Create network input ops and blobs used for training. To be called
*after* model_builder.create().
"""
# Implementation notes:
# Typically, one would create the input ops and then the rest of the net.
# However, creating the input ops depends on loading the dataset, which
# can take a few minutes for COCO.
# We prefer to avoid waiting so debugging can fail fast.
# Thus, we create the net *without input ops* prior to loading the
# dataset, and then add the input ops after loading the dataset.
# Since we defer input op creation, we need to do a little bit of surgery
# to place the input ops at the start of the network op list.
assert model.train, 'Training inputs can only be added to a trainable model'
if roidb is not None:
# To make debugging easier you can set cfg.DATA_LOADER.NUM_THREADS = 1
model.roi_data_loader = RoIDataLoader(
roidb, num_loaders=cfg.DATA_LOADER.NUM_THREADS
)
orig_num_op = len(model.net._net.op)
blob_names = roi_data.minibatch.get_minibatch_blob_names(
is_training=True
)
for gpu_id in range(cfg.NUM_GPUS):
with c2_utils.NamedCudaScope(gpu_id):
for blob_name in blob_names:
workspace.CreateBlob(core.ScopedName(blob_name))
model.net.DequeueBlobs(
model.roi_data_loader._blobs_queue_name, blob_names
)
# A little op surgery to move input ops to the start of the net
diff = len(model.net._net.op) - orig_num_op
new_op = model.net._net.op[-diff:] + model.net._net.op[:-diff]
del model.net._net.op[:]
model.net._net.op.extend(new_op)
def add_inference_inputs(model):
"""Create network input blobs used for inference."""
def create_input_blobs_for_net(net_def):
for op in net_def.op:
for blob_in in op.input:
if not workspace.HasBlob(blob_in):
workspace.CreateBlob(blob_in)
create_input_blobs_for_net(model.net.Proto())
if cfg.MODEL.MASK_ON:
create_input_blobs_for_net(model.mask_net.Proto())
if cfg.MODEL.KEYPOINTS_ON:
create_input_blobs_for_net(model.keypoint_net.Proto())
# ---------------------------------------------------------------------------- #
# ********************** DEPRECATED FUNCTIONALITY BELOW ********************** #
# ---------------------------------------------------------------------------- #
# ---------------------------------------------------------------------------- #
# Hardcoded functions to create various types of common models
#
# *** This type of model definition is deprecated ***
# *** Use the generic composable versions instead ***
#
# ---------------------------------------------------------------------------- #
import modeling.ResNet as ResNet
import modeling.VGG16 as VGG16
import modeling.VGG_CNN_M_1024 as VGG_CNN_M_1024
def fast_rcnn(model):
logger.warn('Deprecated: use `MODEL.TYPE: generalized_rcnn`.')
return generalized_rcnn(model)
def mask_rcnn(model):
logger.warn(
'Deprecated: use `MODEL.TYPE: generalized_rcnn` with '
'`MODEL.MASK_ON: True`'
)
return generalized_rcnn(model)
def keypoint_rcnn(model):
logger.warn(
'Deprecated: use `MODEL.TYPE: generalized_rcnn` with '
'`MODEL.KEYPOINTS_ON: True`'
)
return generalized_rcnn(model)
def mask_and_keypoint_rcnn(model):
logger.warn(
'Deprecated: use `MODEL.TYPE: generalized_rcnn` with '
'`MODEL.MASK_ON: True and ``MODEL.KEYPOINTS_ON: True`'
)
return generalized_rcnn(model)
def rpn(model):
logger.warn(
'Deprecated: use `MODEL.TYPE: generalized_rcnn` with '
'`MODEL.RPN_ONLY: True`'
)
return generalized_rcnn(model)
def fpn_rpn(model):
logger.warn(
'Deprecated: use `MODEL.TYPE: generalized_rcnn` with '
'`MODEL.RPN_ONLY: True` and FPN enabled via configs'
)
return generalized_rcnn(model)
def faster_rcnn(model):
logger.warn(
'Deprecated: use `MODEL.TYPE: generalized_rcnn` with '
'`MODEL.FASTER_RCNN: True`'
)
return generalized_rcnn(model)
def fast_rcnn_frozen_features(model):
logger.warn('Deprecated: use `TRAIN.FREEZE_CONV_BODY: True` instead')
return build_generic_detection_model(
model,
get_func(cfg.MODEL.CONV_BODY),
add_roi_box_head_func=get_func(cfg.FAST_RCNN.ROI_BOX_HEAD),
freeze_conv_body=True
)
def rpn_frozen_features(model):
logger.warn('Deprecated: use `TRAIN.FREEZE_CONV_BODY: True` instead')
return build_generic_detection_model(
model, get_func(cfg.MODEL.CONV_BODY), freeze_conv_body=True
)
def fpn_rpn_frozen_features(model):
logger.warn('Deprecated: use `TRAIN.FREEZE_CONV_BODY: True` instead')
return build_generic_detection_model(
model, get_func(cfg.MODEL.CONV_BODY), freeze_conv_body=True
)
def mask_rcnn_frozen_features(model):
logger.warn('Deprecated: use `TRAIN.FREEZE_CONV_BODY: True` instead')
return build_generic_detection_model(
model,
get_func(cfg.MODEL.CONV_BODY),
add_roi_box_head_func=get_func(cfg.FAST_RCNN.ROI_BOX_HEAD),
add_roi_mask_head_func=get_func(cfg.MRCNN.ROI_MASK_HEAD),
freeze_conv_body=True
)
def keypoint_rcnn_frozen_features(model):
logger.warn('Deprecated: use `TRAIN.FREEZE_CONV_BODY: True` instead')
return build_generic_detection_model(
model,
get_func(cfg.MODEL.CONV_BODY),
add_roi_box_head_func=get_func(cfg.FAST_RCNN.ROI_BOX_HEAD),
add_roi_keypoint_head_func=get_func(cfg.KRCNN.ROI_KEYPOINTS_HEAD),
freeze_conv_body=True
)
# ---------------------------------------------------------------------------- #
# Fast R-CNN models
# ---------------------------------------------------------------------------- #
def VGG_CNN_M_1024_fast_rcnn(model):
return build_generic_detection_model(
model, VGG_CNN_M_1024.add_VGG_CNN_M_1024_conv5_body,
VGG_CNN_M_1024.add_VGG_CNN_M_1024_roi_fc_head
)
def VGG16_fast_rcnn(model):
return build_generic_detection_model(
model, VGG16.add_VGG16_conv5_body, VGG16.add_VGG16_roi_fc_head
)
def ResNet50_fast_rcnn(model):
return build_generic_detection_model(
model, ResNet.add_ResNet50_conv4_body, ResNet.add_ResNet_roi_conv5_head
)
def ResNet101_fast_rcnn(model):
return build_generic_detection_model(
model, ResNet.add_ResNet101_conv4_body, ResNet.add_ResNet_roi_conv5_head
)
def ResNet50_fast_rcnn_frozen_features(model):
return build_generic_detection_model(
model,
ResNet.add_ResNet50_conv4_body,
ResNet.add_ResNet_roi_conv5_head,
freeze_conv_body=True
)
def ResNet101_fast_rcnn_frozen_features(model):
return build_generic_detection_model(
model,
ResNet.add_ResNet101_conv4_body,
ResNet.add_ResNet_roi_conv5_head,
freeze_conv_body=True
)
# ---------------------------------------------------------------------------- #
# RPN-only models
# ---------------------------------------------------------------------------- #
def VGG_CNN_M_1024_rpn(model):
return build_generic_detection_model(
model, VGG_CNN_M_1024.add_VGG_CNN_M_1024_conv5_body
)
def VGG16_rpn(model):
return build_generic_detection_model(model, VGG16.add_VGG16_conv5_body)
def ResNet50_rpn_conv4(model):
return build_generic_detection_model(model, ResNet.add_ResNet50_conv4_body)
def ResNet101_rpn_conv4(model):
return build_generic_detection_model(model, ResNet.add_ResNet101_conv4_body)
def VGG_CNN_M_1024_rpn_frozen_features(model):
return build_generic_detection_model(
model,
VGG_CNN_M_1024.add_VGG_CNN_M_1024_conv5_body,
freeze_conv_body=True
)
def VGG16_rpn_frozen_features(model):
return build_generic_detection_model(
model, VGG16.add_VGG16_conv5_body, freeze_conv_body=True
)
def ResNet50_rpn_conv4_frozen_features(model):
return build_generic_detection_model(
model, ResNet.add_ResNet50_conv4_body, freeze_conv_body=True
)
def ResNet101_rpn_conv4_frozen_features(model):
return build_generic_detection_model(
model, ResNet.add_ResNet101_conv4_body, freeze_conv_body=True
)
# ---------------------------------------------------------------------------- #
# Faster R-CNN models
# ---------------------------------------------------------------------------- #
def VGG16_faster_rcnn(model):
assert cfg.MODEL.FASTER_RCNN
return build_generic_detection_model(
model, VGG16.add_VGG16_conv5_body, VGG16.add_VGG16_roi_fc_head
)
def ResNet50_faster_rcnn(model):
assert cfg.MODEL.FASTER_RCNN
return build_generic_detection_model(
model, ResNet.add_ResNet50_conv4_body, ResNet.add_ResNet_roi_conv5_head
)
def ResNet101_faster_rcnn(model):
assert cfg.MODEL.FASTER_RCNN
return build_generic_detection_model(
model, ResNet.add_ResNet101_conv4_body, ResNet.add_ResNet_roi_conv5_head
)
# ---------------------------------------------------------------------------- #
# R-FCN models
# ---------------------------------------------------------------------------- #
def ResNet50_rfcn(model):
return build_generic_rfcn_model(
model, ResNet.add_ResNet50_conv5_body, dim_reduce=1024
)
def ResNet101_rfcn(model):
return build_generic_rfcn_model(
model, ResNet.add_ResNet101_conv5_body, dim_reduce=1024
)
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Handle mapping from old network building function names to new names.
Flexible network configuration is achieved by specifying the function name that
builds a network module (e.g., the name of the conv backbone or the mask roi
head). However we may wish to change names over time without breaking previous
config files. This module provides backwards naming compatibility by providing
a mapping from the old name to the new name.
When renaming functions, it's generally a good idea to codemod existing yaml
config files. An easy way to batch edit, by example, is a shell command like
$ find . -name "*.yaml" -exec sed -i -e \
's/head_builder\.add_roi_2mlp_head/fast_rcnn_heads.add_roi_2mlp_head/g' {} \;
to perform the renaming:
head_builder.add_roi_2mlp_head => fast_rcnn_heads.add_roi_2mlp_head
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
_RENAME = {
# Removed "ResNet_" from the name because it wasn't relevent
'mask_rcnn_heads.ResNet_mask_rcnn_fcn_head_v1up4convs':
'mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs',
# Removed "ResNet_" from the name because it wasn't relevent
'mask_rcnn_heads.ResNet_mask_rcnn_fcn_head_v1up':
'mask_rcnn_heads.mask_rcnn_fcn_head_v1up',
# Removed "ResNet_" from the name because it wasn't relevent
'mask_rcnn_heads.ResNet_mask_rcnn_fcn_head_v0upshare':
'mask_rcnn_heads.mask_rcnn_fcn_head_v0upshare',
# Removed "ResNet_" from the name because it wasn't relevent
'mask_rcnn_heads.ResNet_mask_rcnn_fcn_head_v0up':
'mask_rcnn_heads.mask_rcnn_fcn_head_v0up',
# Removed head_builder module in favor of the more specific fast_rcnn name
'head_builder.add_roi_2mlp_head':
'fast_rcnn_heads.add_roi_2mlp_head',
}
def get_new_name(func_name):
if func_name in _RENAME:
func_name = _RENAME[func_name]
return func_name
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Optimization operator graph construction."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import logging
from caffe2.python import muji
from core.config import cfg
import utils.c2 as c2_utils
logger = logging.getLogger(__name__)
def build_data_parallel_model(model, single_gpu_build_func):
"""Build a data parallel model given a function that builds the model on a
single GPU.
"""
if model.train:
all_loss_gradients = _build_forward_graph(model, single_gpu_build_func)
# Add backward pass on all GPUs
model.AddGradientOperators(all_loss_gradients)
if cfg.NUM_GPUS > 1:
_add_allreduce_graph(model)
for gpu_id in range(cfg.NUM_GPUS):
# After allreduce, all GPUs perform SGD updates on their identical
# params and gradients in parallel
_add_parameter_update_ops(model, gpu_id)
else:
# Test-time network operates on single GPU
# Test-time parallelism is implemented through multiprocessing
with c2_utils.NamedCudaScope(0):
single_gpu_build_func(model)
def _build_forward_graph(model, single_gpu_build_func):
"""Construct the forward graph on each GPU."""
all_loss_gradients = {} # Will include loss gradients from all GPUs
# Build the model on each GPU with correct name and device scoping
for gpu_id in range(cfg.NUM_GPUS):
with c2_utils.NamedCudaScope(gpu_id):
all_loss_gradients.update(single_gpu_build_func(model))
return all_loss_gradients
def _add_allreduce_graph(model):
"""Construct the graph that performs Allreduce on the gradients."""
# Need to all-reduce the per-GPU gradients if training with more than 1 GPU
all_params = model.TrainableParams()
assert len(all_params) % cfg.NUM_GPUS == 0
# The model parameters are replicated on each GPU, get the number
# distinct parameter blobs (i.e., the number of parameter blobs on
# each GPU)
params_per_gpu = int(len(all_params) / cfg.NUM_GPUS)
with c2_utils.CudaScope(0):
# Iterate over distinct parameter blobs
for i in range(params_per_gpu):
# Gradients from all GPUs for this parameter blob
gradients = [
model.param_to_grad[p] for p in all_params[i::params_per_gpu]
]
if len(gradients) > 0:
if cfg.USE_NCCL:
model.net.NCCLAllreduce(gradients, gradients)
else:
muji.Allreduce(model.net, gradients, reduced_affix='')
def _add_parameter_update_ops(model, gpu_id):
"""Construct the optimizer update op graph."""
with c2_utils.NamedCudaScope(gpu_id):
# Learning rate of 0 is a dummy value to be set properly at the
# start of training
lr = model.param_init_net.ConstantFill(
[], 'lr', shape=[1], value=0.0
)
one = model.param_init_net.ConstantFill(
[], 'one', shape=[1], value=1.0
)
wd = model.param_init_net.ConstantFill(
[], 'wd', shape=[1], value=cfg.SOLVER.WEIGHT_DECAY
)
for param in model.TrainableParams(gpu_id=gpu_id):
logger.info('param ' + str(param) + ' will be updated')
param_grad = model.param_to_grad[param]
# Initialize momentum vector
param_momentum = model.param_init_net.ConstantFill(
[param], param + '_momentum', value=0.0
)
if param in model.biases:
# Special treatment for biases (mainly to match historical impl.
# details):
# (1) Do not apply weight decay
# (2) Use a 2x higher learning rate
model.Scale(param_grad, param_grad, scale=2.0)
elif cfg.SOLVER.WEIGHT_DECAY > 0:
# Apply weight decay to non-bias weights
model.WeightedSum([param_grad, one, param, wd], param_grad)
# Update param_grad and param_momentum in place
model.net.MomentumSGDUpdate(
[param_grad, param_momentum, lr, param],
[param_grad, param_momentum, param],
momentum=cfg.SOLVER.MOMENTUM
)
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""RetinaNet model heads and losses. See: https://arxiv.org/abs/1708.02002."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
from core.config import cfg
import utils.blob as blob_utils
def get_retinanet_bias_init(model):
"""Initialize the biases for the conv ops that predict class probabilities.
Initialization is performed such that at the start of training, all
locations are predicted to be background with high probability
(e.g., ~0.99 = 1 - cfg.RETINANET.PRIOR_PROB). See the Focal Loss paper for
details.
"""
prior_prob = cfg.RETINANET.PRIOR_PROB
scales_per_octave = cfg.RETINANET.SCALES_PER_OCTAVE
aspect_ratios = len(cfg.RETINANET.ASPECT_RATIOS)
if cfg.RETINANET.SOFTMAX:
# Multiclass softmax case
bias = np.zeros((model.num_classes, 1), dtype=np.float32)
bias[0] = np.log(
(model.num_classes - 1) * (1 - prior_prob) / (prior_prob)
)
bias = np.vstack(
[bias for _ in range(scales_per_octave * aspect_ratios)]
)
bias_init = (
'GivenTensorFill', {
'values': bias.astype(dtype=np.float32)
}
)
else:
# Per-class sigmoid (binary classification) case
bias_init = (
'ConstantFill', {
'value': -np.log((1 - prior_prob) / prior_prob)
}
)
return bias_init
def add_fpn_retinanet_outputs(model, blobs_in, dim_in, spatial_scales):
"""RetinaNet head. For classification and box regression, we can chose to
have the same conv tower or a separate tower. "bl_feat_list" stores the list
of feature blobs for bbox prediction. These blobs can be shared cls feature
blobs if we share the tower or else are independent blobs.
"""
dim_out = dim_in
k_max = cfg.FPN.RPN_MAX_LEVEL # coarsest level of pyramid
k_min = cfg.FPN.RPN_MIN_LEVEL # finest level of pyramid
A = len(cfg.RETINANET.ASPECT_RATIOS) * cfg.RETINANET.SCALES_PER_OCTAVE
# compute init for bias
bias_init = get_retinanet_bias_init(model)
assert len(blobs_in) == k_max - k_min + 1
bbox_feat_list = []
cls_pred_dim = (
model.num_classes if cfg.RETINANET.SOFTMAX else (model.num_classes - 1)
)
# unpacked bbox feature and add prediction layers
bbox_regr_dim = (
4 * (model.num_classes - 1) if cfg.RETINANET.CLASS_SPECIFIC_BBOX else 4
)
# ==========================================================================
# classification tower with logits and prob prediction
# ==========================================================================
for lvl in range(k_min, k_max + 1):
bl_in = blobs_in[k_max - lvl] # blobs_in is in reversed order
# classification tower stack convolution starts
for nconv in range(cfg.RETINANET.NUM_CONVS):
suffix = 'n{}_fpn{}'.format(nconv, lvl)
dim_in, dim_out = dim_in, dim_in
if lvl == k_min:
bl_out = model.Conv(
bl_in,
'retnet_cls_conv_' + suffix,
dim_in,
dim_out,
3,
stride=1,
pad=1,
weight_init=('GaussianFill', {
'std': 0.01
}),
bias_init=('ConstantFill', {
'value': 0.
})
)
else:
bl_out = model.ConvShared(
bl_in,
'retnet_cls_conv_' + suffix,
dim_in,
dim_out,
3,
stride=1,
pad=1,
weight='retnet_cls_conv_n{}_fpn{}_w'.format(nconv, k_min),
bias='retnet_cls_conv_n{}_fpn{}_b'.format(nconv, k_min)
)
bl_in = model.Relu(bl_out, bl_out)
bl_feat = bl_in
# cls tower stack convolution ends. Add the logits layer now
if lvl == k_min:
retnet_cls_pred = model.Conv(
bl_feat,
'retnet_cls_pred_fpn{}'.format(lvl),
dim_in,
cls_pred_dim * A,
3,
pad=1,
stride=1,
weight_init=('GaussianFill', {
'std': 0.01
}),
bias_init=bias_init
)
else:
retnet_cls_pred = model.ConvShared(
bl_feat,
'retnet_cls_pred_fpn{}'.format(lvl),
dim_in,
cls_pred_dim * A,
3,
pad=1,
stride=1,
weight='retnet_cls_pred_fpn{}_w'.format(k_min),
bias='retnet_cls_pred_fpn{}_b'.format(k_min)
)
if not model.train:
if cfg.RETINANET.SOFTMAX:
model.net.GroupSpatialSoftmax(
retnet_cls_pred,
'retnet_cls_prob_fpn{}'.format(lvl),
num_classes=cls_pred_dim
)
else:
model.net.Sigmoid(
retnet_cls_pred, 'retnet_cls_prob_fpn{}'.format(lvl)
)
if cfg.RETINANET.SHARE_CLS_BBOX_TOWER:
bbox_feat_list.append(bl_feat)
# ==========================================================================
# bbox tower if not sharing features with the classification tower with
# logits and prob prediction
# ==========================================================================
if not cfg.RETINANET.SHARE_CLS_BBOX_TOWER:
for lvl in range(k_min, k_max + 1):
bl_in = blobs_in[k_max - lvl] # blobs_in is in reversed order
for nconv in range(cfg.RETINANET.NUM_CONVS):
suffix = 'n{}_fpn{}'.format(nconv, lvl)
dim_in, dim_out = dim_in, dim_in
if lvl == k_min:
bl_out = model.Conv(
bl_in,
'retnet_bbox_conv_' + suffix,
dim_in,
dim_out,
3,
stride=1,
pad=1,
weight_init=('GaussianFill', {
'std': 0.01
}),
bias_init=('ConstantFill', {
'value': 0.
})
)
else:
bl_out = model.ConvShared(
bl_in,
'retnet_bbox_conv_' + suffix,
dim_in,
dim_out,
3,
stride=1,
pad=1,
weight='retnet_bbox_conv_n{}_fpn{}_w'.format(
nconv, k_min
),
bias='retnet_bbox_conv_n{}_fpn{}_b'.format(
nconv, k_min
)
)
bl_in = model.Relu(bl_out, bl_out)
# Add octave scales and aspect ratio
# At least 1 convolution for dealing different aspect ratios
bl_feat = bl_in
bbox_feat_list.append(bl_feat)
# Depending on the features [shared/separate] for bbox, add prediction layer
for i, lvl in enumerate(range(k_min, k_max + 1)):
bbox_pred = 'retnet_bbox_pred_fpn{}'.format(lvl)
bl_feat = bbox_feat_list[i]
if lvl == k_min:
model.Conv(
bl_feat,
bbox_pred,
dim_in,
bbox_regr_dim * A,
3,
pad=1,
stride=1,
weight_init=('GaussianFill', {
'std': 0.01
}),
bias_init=('ConstantFill', {
'value': 0.
})
)
else:
model.ConvShared(
bl_feat,
bbox_pred,
dim_in,
bbox_regr_dim * A,
3,
pad=1,
stride=1,
weight='retnet_bbox_pred_fpn{}_w'.format(k_min),
bias='retnet_bbox_pred_fpn{}_b'.format(k_min)
)
def add_fpn_retinanet_losses(model):
loss_gradients = {}
gradients, losses = [], []
k_max = cfg.FPN.RPN_MAX_LEVEL # coarsest level of pyramid
k_min = cfg.FPN.RPN_MIN_LEVEL # finest level of pyramid
model.AddMetrics(['retnet_fg_num', 'retnet_bg_num'])
# ==========================================================================
# bbox regression loss - SelectSmoothL1Loss for multiple anchors at a location
# ==========================================================================
for lvl in range(k_min, k_max + 1):
suffix = 'fpn{}'.format(lvl)
bbox_loss = model.net.SelectSmoothL1Loss(
[
'retnet_bbox_pred_' + suffix,
'retnet_roi_bbox_targets_' + suffix,
'retnet_roi_fg_bbox_locs_' + suffix, 'retnet_fg_num'
],
'retnet_loss_bbox_' + suffix,
beta=cfg.RETINANET.BBOX_REG_BETA,
scale=1. / cfg.NUM_GPUS * cfg.RETINANET.BBOX_REG_WEIGHT
)
gradients.append(bbox_loss)
losses.append('retnet_loss_bbox_' + suffix)
# ==========================================================================
# cls loss - depends on softmax/sigmoid outputs
# ==========================================================================
for lvl in range(k_min, k_max + 1):
suffix = 'fpn{}'.format(lvl)
cls_lvl_logits = 'retnet_cls_pred_' + suffix
if not cfg.RETINANET.SOFTMAX:
cls_focal_loss = model.net.SigmoidFocalLoss(
[
cls_lvl_logits, 'retnet_cls_labels_' + suffix,
'retnet_fg_num'
],
['fl_{}'.format(suffix)],
gamma=cfg.RETINANET.LOSS_GAMMA,
alpha=cfg.RETINANET.LOSS_ALPHA,
scale=(1. / cfg.NUM_GPUS)
)
gradients.append(cls_focal_loss)
losses.append('fl_{}'.format(suffix))
else:
cls_focal_loss, gated_prob = model.net.SoftmaxFocalLoss(
[
cls_lvl_logits, 'retnet_cls_labels_' + suffix,
'retnet_fg_num'
],
['fl_{}'.format(suffix), 'retnet_prob_{}'.format(suffix)],
gamma=cfg.RETINANET.LOSS_GAMMA,
alpha=cfg.RETINANET.LOSS_ALPHA,
scale=(1. / cfg.NUM_GPUS),
)
gradients.append(cls_focal_loss)
losses.append('fl_{}'.format(suffix))
loss_gradients.update(blob_utils.get_loss_gradients(model, gradients))
model.AddLosses(losses)
return loss_gradients
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from core.config import cfg
from utils.c2 import const_fill
from utils.c2 import gauss_fill
# ---------------------------------------------------------------------------- #
# R-FCN outputs and losses
# ---------------------------------------------------------------------------- #
def add_rfcn_outputs(model, blob_in, dim_in, dim_reduce, spatial_scale):
if dim_reduce is not None:
# Optional dim reduction
blob_in = model.Conv(
blob_in,
'conv_dim_reduce',
dim_in,
dim_reduce,
kernel=1,
pad=0,
stride=1,
weight_init=gauss_fill(0.01),
bias_init=const_fill(0.0)
)
blob_in = model.Relu(blob_in, blob_in)
dim_in = dim_reduce
# Classification conv
model.Conv(
blob_in,
'conv_cls',
dim_in,
model.num_classes * cfg.RFCN.PS_GRID_SIZE**2,
kernel=1,
pad=0,
stride=1,
weight_init=gauss_fill(0.01),
bias_init=const_fill(0.0)
)
# # Bounding-box regression conv
num_bbox_reg_classes = (
2 if cfg.MODEL.CLS_AGNOSTIC_BBOX_REG else model.num_classes
)
model.Conv(
blob_in,
'conv_bbox_pred',
dim_in,
4 * num_bbox_reg_classes * cfg.RFCN.PS_GRID_SIZE**2,
kernel=1,
pad=0,
stride=1,
weight_init=gauss_fill(0.01),
bias_init=const_fill(0.0)
)
# Classification PS RoI pooling
model.net.PSRoIPool(
['conv_cls', 'rois'], ['psroipooled_cls', '_mapping_channel_cls'],
group_size=cfg.RFCN.PS_GRID_SIZE,
output_dim=model.num_classes,
spatial_scale=spatial_scale
)
model.AveragePool(
'psroipooled_cls', 'cls_score_4d', kernel=cfg.RFCN.PS_GRID_SIZE
)
model.net.Reshape(
'cls_score_4d', ['cls_score', '_cls_scores_shape'],
shape=(-1, cfg.MODEL.NUM_CLASSES)
)
if not model.train:
model.Softmax('cls_score', 'cls_prob', engine='CUDNN')
# Bbox regression PS RoI pooling
model.net.PSRoIPool(
['conv_bbox_pred', 'rois'],
['psroipooled_bbox', '_mapping_channel_bbox'],
group_size=cfg.RFCN.PS_GRID_SIZE,
output_dim=4 * num_bbox_reg_classes,
spatial_scale=spatial_scale
)
model.AveragePool(
'psroipooled_bbox', 'bbox_pred', kernel=cfg.RFCN.PS_GRID_SIZE
)
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from core.config import cfg
from modeling.generate_anchors import generate_anchors
from utils.c2 import const_fill
from utils.c2 import gauss_fill
import modeling.FPN as FPN
import utils.blob as blob_utils
# ---------------------------------------------------------------------------- #
# RPN and Faster R-CNN outputs and losses
# ---------------------------------------------------------------------------- #
def add_generic_rpn_outputs(model, blob_in, dim_in, spatial_scale_in):
"""Add RPN outputs (objectness classification and bounding box regression)
to an RPN model. Abstracts away the use of FPN.
"""
loss_gradients = None
if cfg.FPN.FPN_ON:
# Delegate to the FPN module
FPN.add_fpn_rpn_outputs(model, blob_in, dim_in, spatial_scale_in)
if cfg.MODEL.FASTER_RCNN:
# CollectAndDistributeFpnRpnProposals also labels proposals when in
# training mode
model.CollectAndDistributeFpnRpnProposals()
if model.train:
loss_gradients = FPN.add_fpn_rpn_losses(model)
else:
# Not using FPN, add RPN to a single scale
add_single_scale_rpn_outputs(model, blob_in, dim_in, spatial_scale_in)
if model.train:
loss_gradients = add_single_scale_rpn_losses(model)
return loss_gradients
def add_single_scale_rpn_outputs(model, blob_in, dim_in, spatial_scale):
"""Add RPN outputs to a single scale model (i.e., no FPN)."""
anchors = generate_anchors(
stride=1. / spatial_scale,
sizes=cfg.RPN.SIZES,
aspect_ratios=cfg.RPN.ASPECT_RATIOS
)
num_anchors = anchors.shape[0]
dim_out = dim_in
# RPN hidden representation
model.Conv(
blob_in,
'conv_rpn',
dim_in,
dim_out,
kernel=3,
pad=1,
stride=1,
weight_init=gauss_fill(0.01),
bias_init=const_fill(0.0)
)
model.Relu('conv_rpn', 'conv_rpn')
# Proposal classification scores
model.Conv(
'conv_rpn',
'rpn_cls_logits',
dim_in,
num_anchors,
kernel=1,
pad=0,
stride=1,
weight_init=gauss_fill(0.01),
bias_init=const_fill(0.0)
)
# Proposal bbox regression deltas
model.Conv(
'conv_rpn',
'rpn_bbox_pred',
dim_in,
4 * num_anchors,
kernel=1,
pad=0,
stride=1,
weight_init=gauss_fill(0.01),
bias_init=const_fill(0.0)
)
if not model.train or cfg.MODEL.FASTER_RCNN:
# Proposals are needed during:
# 1) inference (== not model.train) for RPN only and Faster R-CNN
# OR
# 2) training for Faster R-CNN
# Otherwise (== training for RPN only), proposals are not needed
model.net.Sigmoid('rpn_cls_logits', 'rpn_cls_probs')
model.GenerateProposals(
['rpn_cls_probs', 'rpn_bbox_pred', 'im_info'],
['rpn_rois', 'rpn_roi_probs'],
anchors=anchors,
spatial_scale=spatial_scale
)
if cfg.MODEL.FASTER_RCNN:
if model.train:
# Add op that generates training labels for in-network RPN proposals
model.GenerateProposalLabels(['rpn_rois', 'roidb', 'im_info'])
else:
# Alias rois to rpn_rois for inference
model.net.Alias('rpn_rois', 'rois')
def add_single_scale_rpn_losses(model):
"""Add losses for a single scale RPN model (i.e., no FPN)."""
# Spatially narrow the full-sized RPN label arrays to match the feature map
# shape
model.net.SpatialNarrowAs(
['rpn_labels_int32_wide', 'rpn_cls_logits'], 'rpn_labels_int32'
)
for key in ('targets', 'inside_weights', 'outside_weights'):
model.net.SpatialNarrowAs(
['rpn_bbox_' + key + '_wide', 'rpn_bbox_pred'], 'rpn_bbox_' + key
)
loss_rpn_cls = model.net.SigmoidCrossEntropyLoss(
['rpn_cls_logits', 'rpn_labels_int32'],
'loss_rpn_cls',
scale=1. / cfg.NUM_GPUS
)
loss_rpn_bbox = model.net.SmoothL1Loss(
[
'rpn_bbox_pred', 'rpn_bbox_targets', 'rpn_bbox_inside_weights',
'rpn_bbox_outside_weights'
],
'loss_rpn_bbox',
beta=1. / 9.,
scale=1. / cfg.NUM_GPUS
)
loss_gradients = blob_utils.get_loss_gradients(
model, [loss_rpn_cls, loss_rpn_bbox]
)
model.AddLosses(['loss_rpn_cls', 'loss_rpn_bbox'])
return loss_gradients
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
from core.config import cfg
from datasets import json_dataset
import modeling.FPN as fpn
import roi_data.fast_rcnn
import utils.blob as blob_utils
class CollectAndDistributeFpnRpnProposalsOp(object):
def __init__(self, train):
self._train = train
def forward(self, inputs, outputs):
"""See modeling.detector.CollectAndDistributeFpnRpnProposals for
inputs/outputs documentation.
"""
# inputs is
# [rpn_rois_fpn2, ..., rpn_rois_fpn6,
# rpn_roi_probs_fpn2, ..., rpn_roi_probs_fpn6]
# If training with Faster R-CNN, then inputs will additionally include
# + [roidb, im_info]
rois = collect(inputs, self._train)
if self._train:
# During training we reuse the data loader code. We populate roidb
# entries on the fly using the rois generated by RPN.
# im_info: [[im_height, im_width, im_scale], ...]
im_info = inputs[-1].data
im_scales = im_info[:, 2]
roidb = blob_utils.deserialize(inputs[-2].data)
# For historical consistency with the original Faster R-CNN
# implementation we are *not* filtering crowd proposals.
# This choice should be investigated in the future (it likely does
# not matter).
json_dataset.add_proposals(roidb, rois, im_scales, crowd_thresh=0)
# Compute training labels for the RPN proposals; also handles
# distributing the proposals over FPN levels
output_blob_names = roi_data.fast_rcnn.get_fast_rcnn_blob_names()
blobs = {k: [] for k in output_blob_names}
roi_data.fast_rcnn.add_fast_rcnn_blobs(blobs, im_scales, roidb)
for i, k in enumerate(output_blob_names):
blob_utils.py_op_copy_blob(blobs[k], outputs[i])
else:
# For inference we have a special code path that avoids some data
# loader overhead
distribute(rois, None, outputs, self._train)
def collect(inputs, is_training):
cfg_key = 'TRAIN' if is_training else 'TEST'
post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N
k_max = cfg.FPN.RPN_MAX_LEVEL
k_min = cfg.FPN.RPN_MIN_LEVEL
num_lvls = k_max - k_min + 1
roi_inputs = inputs[:num_lvls]
score_inputs = inputs[num_lvls:]
if is_training:
score_inputs = score_inputs[:-2]
# rois are in [[batch_idx, x0, y0, x1, y2], ...] format
# Combine predictions across all levels and retain the top scoring
rois = np.concatenate([blob.data for blob in roi_inputs])
scores = np.concatenate([blob.data for blob in score_inputs]).squeeze()
inds = np.argsort(-scores)[:post_nms_topN]
rois = rois[inds, :]
return rois
def distribute(rois, label_blobs, outputs, train):
"""To understand the output blob order see return value of
roi_data.fast_rcnn.get_fast_rcnn_blob_names(is_training=False)
"""
lvl_min = cfg.FPN.ROI_MIN_LEVEL
lvl_max = cfg.FPN.ROI_MAX_LEVEL
lvls = fpn.map_rois_to_fpn_levels(rois[:, 1:5], lvl_min, lvl_max)
outputs[0].reshape(rois.shape)
outputs[0].data[...] = rois
# Create new roi blobs for each FPN level
# (See: modeling.FPN.add_multilevel_roi_blobs which is similar but annoying
# to generalize to support this particular case.)
rois_idx_order = np.empty((0, ))
for output_idx, lvl in enumerate(range(lvl_min, lvl_max + 1)):
idx_lvl = np.where(lvls == lvl)[0]
blob_roi_level = rois[idx_lvl, :]
outputs[output_idx + 1].reshape(blob_roi_level.shape)
outputs[output_idx + 1].data[...] = blob_roi_level
rois_idx_order = np.concatenate((rois_idx_order, idx_lvl))
rois_idx_restore = np.argsort(rois_idx_order)
blob_utils.py_op_copy_blob(rois_idx_restore.astype(np.int32), outputs[-1])
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import logging
from datasets import json_dataset
from utils import blob as blob_utils
import roi_data.fast_rcnn
logger = logging.getLogger(__name__)
class GenerateProposalLabelsOp(object):
def forward(self, inputs, outputs):
"""See modeling.detector.GenerateProposalLabels for inputs/outputs
documentation.
"""
# During training we reuse the data loader code. We populate roidb
# entries on the fly using the rois generated by RPN.
# im_info: [[im_height, im_width, im_scale], ...]
rois = inputs[0].data
roidb = blob_utils.deserialize(inputs[1].data)
im_info = inputs[2].data
im_scales = im_info[:, 2]
output_blob_names = roi_data.fast_rcnn.get_fast_rcnn_blob_names()
# For historical consistency with the original Faster R-CNN
# implementation we are *not* filtering crowd proposals.
# This choice should be investigated in the future (it likely does
# not matter).
json_dataset.add_proposals(roidb, rois, im_scales, crowd_thresh=0)
blobs = {k: [] for k in output_blob_names}
roi_data.fast_rcnn.add_fast_rcnn_blobs(blobs, im_scales, roidb)
for i, k in enumerate(output_blob_names):
blob_utils.py_op_copy_blob(blobs[k], outputs[i])
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
#
# Based on:
# --------------------------------------------------------
# Faster R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick and Sean Bell
# --------------------------------------------------------
import numpy as np
from core.config import cfg
import utils.boxes as box_utils
class GenerateProposalsOp(object):
"""Output object detection proposals by applying estimated bounding-box
transformations to a set of regular boxes (called "anchors").
"""
def __init__(self, anchors, spatial_scale, train):
self._anchors = anchors
self._num_anchors = self._anchors.shape[0]
self._feat_stride = 1. / spatial_scale
self._train = train
def forward(self, inputs, outputs):
"""See modeling.detector.GenerateProposals for inputs/outputs
documentation.
"""
# 1. for each location i in a (H, W) grid:
# generate A anchor boxes centered on cell i
# apply predicted bbox deltas to each of the A anchors at cell i
# 2. clip predicted boxes to image
# 3. remove predicted boxes with either height or width < threshold
# 4. sort all (proposal, score) pairs by score from highest to lowest
# 5. take the top pre_nms_topN proposals before NMS
# 6. apply NMS with a loose threshold (0.7) to the remaining proposals
# 7. take after_nms_topN proposals after NMS
# 8. return the top proposals
# predicted probability of fg object for each RPN anchor
scores = inputs[0].data
# predicted achors transformations
bbox_deltas = inputs[1].data
# input image (height, width, scale), in which scale is the scale factor
# applied to the original dataset image to get the network input image
im_info = inputs[2].data
# 1. Generate proposals from bbox deltas and shifted anchors
height, width = scores.shape[-2:]
# Enumerate all shifted positions on the (H, W) grid
shift_x = np.arange(0, width) * self._feat_stride
shift_y = np.arange(0, height) * self._feat_stride
shift_x, shift_y = np.meshgrid(shift_x, shift_y, copy=False)
# Convert to (K, 4), K=H*W, where the columns are (dx, dy, dx, dy)
# shift pointing to each grid location
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
shift_x.ravel(), shift_y.ravel())).transpose()
# Broacast anchors over shifts to enumerate all anchors at all positions
# in the (H, W) grid:
# - add A anchors of shape (1, A, 4) to
# - K shifts of shape (K, 1, 4) to get
# - all shifted anchors of shape (K, A, 4)
# - reshape to (K*A, 4) shifted anchors
num_images = inputs[0].shape[0]
A = self._num_anchors
K = shifts.shape[0]
all_anchors = self._anchors[np.newaxis, :, :] + shifts[:, np.newaxis, :]
all_anchors = all_anchors.reshape((K * A, 4))
rois = np.empty((0, 5), dtype=np.float32)
roi_probs = np.empty((0, 1), dtype=np.float32)
for im_i in range(num_images):
im_i_boxes, im_i_probs = self.proposals_for_one_image(
im_info[im_i, :], all_anchors, bbox_deltas[im_i, :, :, :],
scores[im_i, :, :, :]
)
batch_inds = im_i * np.ones(
(im_i_boxes.shape[0], 1), dtype=np.float32
)
im_i_rois = np.hstack((batch_inds, im_i_boxes))
rois = np.append(rois, im_i_rois, axis=0)
roi_probs = np.append(roi_probs, im_i_probs, axis=0)
outputs[0].reshape(rois.shape)
outputs[0].data[...] = rois
if len(outputs) > 1:
outputs[1].reshape(roi_probs.shape)
outputs[1].data[...] = roi_probs
def proposals_for_one_image(
self, im_info, all_anchors, bbox_deltas, scores
):
# Get mode-dependent configuration
cfg_key = 'TRAIN' if self._train else 'TEST'
pre_nms_topN = cfg[cfg_key].RPN_PRE_NMS_TOP_N
post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N
nms_thresh = cfg[cfg_key].RPN_NMS_THRESH
min_size = cfg[cfg_key].RPN_MIN_SIZE
# Transpose and reshape predicted bbox transformations to get them
# into the same order as the anchors:
# - bbox deltas will be (4 * A, H, W) format from conv output
# - transpose to (H, W, 4 * A)
# - reshape to (H * W * A, 4) where rows are ordered by (H, W, A)
# in slowest to fastest order to match the enumerated anchors
bbox_deltas = bbox_deltas.transpose((1, 2, 0)).reshape((-1, 4))
# Same story for the scores:
# - scores are (A, H, W) format from conv output
# - transpose to (H, W, A)
# - reshape to (H * W * A, 1) where rows are ordered by (H, W, A)
# to match the order of anchors and bbox_deltas
scores = scores.transpose((1, 2, 0)).reshape((-1, 1))
# 4. sort all (proposal, score) pairs by score from highest to lowest
# 5. take top pre_nms_topN (e.g. 6000)
if pre_nms_topN <= 0 or pre_nms_topN >= len(scores):
order = np.argsort(-scores.squeeze())
else:
# Avoid sorting possibly large arrays; First partition to get top K
# unsorted and then sort just those (~20x faster for 200k scores)
inds = np.argpartition(
-scores.squeeze(), pre_nms_topN
)[:pre_nms_topN]
order = np.argsort(-scores[inds].squeeze())
order = inds[order]
bbox_deltas = bbox_deltas[order, :]
all_anchors = all_anchors[order, :]
scores = scores[order]
# Transform anchors into proposals via bbox transformations
proposals = box_utils.bbox_transform(
all_anchors, bbox_deltas, (1.0, 1.0, 1.0, 1.0))
# 2. clip proposals to image (may result in proposals with zero area
# that will be removed in the next step)
proposals = box_utils.clip_tiled_boxes(proposals, im_info[:2])
# 3. remove predicted boxes with either height or width < min_size
keep = _filter_boxes(proposals, min_size, im_info)
proposals = proposals[keep, :]
scores = scores[keep]
# 6. apply loose nms (e.g. threshold = 0.7)
# 7. take after_nms_topN (e.g. 300)
# 8. return the top proposals (-> RoIs top)
if nms_thresh > 0:
keep = box_utils.nms(np.hstack((proposals, scores)), nms_thresh)
if post_nms_topN > 0:
keep = keep[:post_nms_topN]
proposals = proposals[keep, :]
scores = scores[keep]
return proposals, scores
def _filter_boxes(boxes, min_size, im_info):
"""Only keep boxes with both sides >= min_size and center within the image.
"""
# Scale min_size to match image scale
min_size *= im_info[2]
ws = boxes[:, 2] - boxes[:, 0] + 1
hs = boxes[:, 3] - boxes[:, 1] + 1
x_ctr = boxes[:, 0] + ws / 2.
y_ctr = boxes[:, 1] + hs / 2.
keep = np.where(
(ws >= min_size) & (hs >= min_size) &
(x_ctr < im_info[1]) & (y_ctr < im_info[0]))[0]
return keep
/**
* Copyright (c) 2016-present, Facebook, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "zero_even_op.h"
namespace caffe2 {
template <>
bool ZeroEvenOp<float, CPUContext>::RunOnDevice() {
// Retrieve the input tensor.
const auto& X = Input(0);
CAFFE_ENFORCE(X.ndim() == 1);
// Initialize the output tensor to a copy of the input tensor.
auto* Y = Output(0);
Y->CopyFrom(X);
// Set output elements at even indices to zero.
auto* Y_data = Y->mutable_data<float>();
for (auto i = 0; i < Y->size(); i += 2) {
Y_data[i] = 0.0f;
}
return true;
}
REGISTER_CPU_OPERATOR(ZeroEven, ZeroEvenOp<float, CPUContext>);
OPERATOR_SCHEMA(ZeroEven)
.NumInputs(1)
.NumOutputs(1)
.Input(
0,
"X",
"1D input tensor")
.Output(
0,
"Y",
"1D output tensor");
} // namespace caffe2
/**
* Copyright (c) 2016-present, Facebook, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "caffe2/core/context_gpu.h"
#include "zero_even_op.h"
namespace caffe2 {
namespace {
template <typename T>
__global__ void SetEvenIndsToVal(size_t num_even_inds, T val, T* data) {
CUDA_1D_KERNEL_LOOP(i, num_even_inds) {
data[i << 1] = val;
}
}
} // namespace
template <>
bool ZeroEvenOp<float, CUDAContext>::RunOnDevice() {
// Retrieve the input tensor.
const auto& X = Input(0);
CAFFE_ENFORCE(X.ndim() == 1);
// Initialize the output tensor to a copy of the input tensor.
auto* Y = Output(0);
Y->CopyFrom(X);
// Set output elements at even indices to zero.
auto output_size = Y->size();
if (output_size > 0) {
size_t num_even_inds = output_size / 2 + output_size % 2;
SetEvenIndsToVal<float>
<<<CAFFE_GET_BLOCKS(num_even_inds),
CAFFE_CUDA_NUM_THREADS,
0,
context_.cuda_stream()>>>(
num_even_inds,
0.0f,
Y->mutable_data<float>());
}
return true;
}
REGISTER_CUDA_OPERATOR(ZeroEven, ZeroEvenOp<float, CUDAContext>);
} // namespace caffe2
/**
* Copyright (c) 2016-present, Facebook, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef ZERO_EVEN_OP_H_
#define ZERO_EVEN_OP_H_
#include "caffe2/core/context.h"
#include "caffe2/core/operator.h"
namespace caffe2 {
/**
* ZeroEven operator. Zeros elements at even indices of an 1D array.
* Elements at odd indices are preserved.
*
* This toy operator is an example of a custom operator and may be a useful
* reference for adding new custom operators to the Detectron codebase.
*/
template <typename T, class Context>
class ZeroEvenOp final : public Operator<Context> {
public:
// Introduce Operator<Context> helper members.
USE_OPERATOR_CONTEXT_FUNCTIONS;
ZeroEvenOp(const OperatorDef& operator_def, Workspace* ws)
: Operator<Context>(operator_def, ws) {}
bool RunOnDevice() override;
};
} // namespace caffe2
#endif // ZERO_EVEN_OP_H_
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Common utility functions for RPN and RetinaNet minibtach blobs preparation.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from collections import namedtuple
import logging
import numpy as np
import threading
from core.config import cfg
from modeling.generate_anchors import generate_anchors
import utils.boxes as box_utils
logger = logging.getLogger(__name__)
# octave and aspect fields are only used on RetinaNet. Octave corresponds to the
# scale of the anchor and aspect denotes which aspect ratio is used in the range
# of aspect ratios
FieldOfAnchors = namedtuple(
'FieldOfAnchors', [
'field_of_anchors', 'num_cell_anchors', 'stride', 'field_size',
'octave', 'aspect'
]
)
# Cache for memoizing _get_field_of_anchors
_threadlocal_foa = threading.local()
def get_field_of_anchors(
stride, anchor_sizes, anchor_aspect_ratios, octave=None, aspect=None
):
global _threadlocal_foa
if not hasattr(_threadlocal_foa, 'cache'):
_threadlocal_foa.cache = {}
cache_key = str(stride) + str(anchor_sizes) + str(anchor_aspect_ratios)
if cache_key in _threadlocal_foa.cache:
return _threadlocal_foa.cache[cache_key]
# Anchors at a single feature cell
cell_anchors = generate_anchors(
stride=stride, sizes=anchor_sizes, aspect_ratios=anchor_aspect_ratios
)
num_cell_anchors = cell_anchors.shape[0]
# Generate canonical proposals from shifted anchors
# Enumerate all shifted positions on the (H, W) grid
fpn_max_size = cfg.FPN.COARSEST_STRIDE * np.ceil(
cfg.TRAIN.MAX_SIZE / float(cfg.FPN.COARSEST_STRIDE)
)
field_size = int(np.ceil(fpn_max_size / float(stride)))
shifts = np.arange(0, field_size) * stride
shift_x, shift_y = np.meshgrid(shifts, shifts)
shift_x = shift_x.ravel()
shift_y = shift_y.ravel()
shifts = np.vstack((shift_x, shift_y, shift_x, shift_y)).transpose()
# Broacast anchors over shifts to enumerate all anchors at all positions
# in the (H, W) grid:
# - add A cell anchors of shape (1, A, 4) to
# - K shifts of shape (K, 1, 4) to get
# - all shifted anchors of shape (K, A, 4)
# - reshape to (K*A, 4) shifted anchors
A = num_cell_anchors
K = shifts.shape[0]
field_of_anchors = (
cell_anchors.reshape((1, A, 4)) +
shifts.reshape((1, K, 4)).transpose((1, 0, 2))
)
field_of_anchors = field_of_anchors.reshape((K * A, 4))
foa = FieldOfAnchors(
field_of_anchors=field_of_anchors.astype(np.float32),
num_cell_anchors=num_cell_anchors,
stride=stride,
field_size=field_size,
octave=octave,
aspect=aspect
)
_threadlocal_foa.cache[cache_key] = foa
return foa
def unmap(data, count, inds, fill=0):
"""Unmap a subset of item (data) back to the original set of items (of
size count)"""
if count == len(inds):
return data
if len(data.shape) == 1:
ret = np.empty((count, ), dtype=data.dtype)
ret.fill(fill)
ret[inds] = data
else:
ret = np.empty((count, ) + data.shape[1:], dtype=data.dtype)
ret.fill(fill)
ret[inds, :] = data
return ret
def compute_targets(ex_rois, gt_rois, weights=(1.0, 1.0, 1.0, 1.0)):
"""Compute bounding-box regression targets for an image."""
return box_utils.bbox_transform_inv(ex_rois, gt_rois, weights).astype(
np.float32, copy=False
)
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Construct minibatches for Fast R-CNN training. Handles the minibatch blobs
that are specific to Fast R-CNN. Other blobs that are generic to RPN, etc.
are handled by their respecitive roi_data modules.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import logging
import numpy as np
import numpy.random as npr
from core.config import cfg
import modeling.FPN as fpn
import roi_data.keypoint_rcnn
import roi_data.mask_rcnn
import utils.blob as blob_utils
import utils.boxes as box_utils
logger = logging.getLogger(__name__)
def get_fast_rcnn_blob_names(is_training=True):
"""Fast R-CNN blob names."""
# rois blob: holds R regions of interest, each is a 5-tuple
# (batch_idx, x1, y1, x2, y2) specifying an image batch index and a
# rectangle (x1, y1, x2, y2)
blob_names = ['rois']
if is_training:
# labels_int32 blob: R categorical labels in [0, ..., K] for K
# foreground classes plus background
blob_names += ['labels_int32']
if is_training:
# bbox_targets blob: R bounding-box regression targets with 4
# targets per class
blob_names += ['bbox_targets']
# bbox_inside_weights blob: At most 4 targets per roi are active
# this binary vector sepcifies the subset of active targets
blob_names += ['bbox_inside_weights']
blob_names += ['bbox_outside_weights']
if is_training and cfg.MODEL.MASK_ON:
# 'mask_rois': RoIs sampled for training the mask prediction branch.
# Shape is (#masks, 5) in format (batch_idx, x1, y1, x2, y2).
blob_names += ['mask_rois']
# 'roi_has_mask': binary labels for the RoIs specified in 'rois'
# indicating if each RoI has a mask or not. Note that in some cases
# a *bg* RoI will have an all -1 (ignore) mask associated with it in
# the case that no fg RoIs can be sampled. Shape is (batchsize).
blob_names += ['roi_has_mask_int32']
# 'masks_int32' holds binary masks for the RoIs specified in
# 'mask_rois'. Shape is (#fg, M * M) where M is the ground truth
# mask size.
blob_names += ['masks_int32']
if is_training and cfg.MODEL.KEYPOINTS_ON:
# 'keypoint_rois': RoIs sampled for training the keypoint prediction
# branch. Shape is (#instances, 5) in format (batch_idx, x1, y1, x2,
# y2).
blob_names += ['keypoint_rois']
# 'keypoint_locations_int32': index of keypoint in
# KRCNN.HEATMAP_SIZE**2 sized array. Shape is (#instances). Used in
# SoftmaxWithLoss.
blob_names += ['keypoint_locations_int32']
# 'keypoint_weights': weight assigned to each target in
# 'keypoint_locations_int32'. Shape is (#instances). Used in
# SoftmaxWithLoss.
blob_names += ['keypoint_weights']
# 'keypoint_loss_normalizer': optional normalization factor to use if
# cfg.KRCNN.NORMALIZE_BY_VISIBLE_KEYPOINTS is False.
blob_names += ['keypoint_loss_normalizer']
if cfg.FPN.FPN_ON and cfg.FPN.MULTILEVEL_ROIS:
# Support for FPN multi-level rois without bbox reg isn't
# implemented (... and may never be implemented)
k_max = cfg.FPN.ROI_MAX_LEVEL
k_min = cfg.FPN.ROI_MIN_LEVEL
# Same format as rois blob, but one per FPN level
for lvl in range(k_min, k_max + 1):
blob_names += ['rois_fpn' + str(lvl)]
blob_names += ['rois_idx_restore_int32']
if is_training:
if cfg.MODEL.MASK_ON:
for lvl in range(k_min, k_max + 1):
blob_names += ['mask_rois_fpn' + str(lvl)]
blob_names += ['mask_rois_idx_restore_int32']
if cfg.MODEL.KEYPOINTS_ON:
for lvl in range(k_min, k_max + 1):
blob_names += ['keypoint_rois_fpn' + str(lvl)]
blob_names += ['keypoint_rois_idx_restore_int32']
return blob_names
def add_fast_rcnn_blobs(blobs, im_scales, roidb):
"""Add blobs needed for training Fast R-CNN style models."""
# Sample training RoIs from each image and append them to the blob lists
for im_i, entry in enumerate(roidb):
frcn_blobs = _sample_rois(entry, im_scales[im_i], im_i)
for k, v in frcn_blobs.items():
blobs[k].append(v)
# Concat the training blob lists into tensors
for k, v in blobs.items():
if isinstance(v, list) and len(v) > 0:
blobs[k] = np.concatenate(v)
# Add FPN multilevel training RoIs, if configured
if cfg.FPN.FPN_ON and cfg.FPN.MULTILEVEL_ROIS:
_add_multilevel_rois(blobs)
# Perform any final work and validity checks after the collating blobs for
# all minibatch images
valid = True
if cfg.MODEL.KEYPOINTS_ON:
valid = roi_data.keypoint_rcnn.finalize_keypoint_minibatch(blobs, valid)
return valid
def _sample_rois(roidb, im_scale, batch_idx):
"""Generate a random sample of RoIs comprising foreground and background
examples.
"""
rois_per_image = int(cfg.TRAIN.BATCH_SIZE_PER_IM)
fg_rois_per_image = int(np.round(cfg.TRAIN.FG_FRACTION * rois_per_image))
max_overlaps = roidb['max_overlaps']
# Select foreground RoIs as those with >= FG_THRESH overlap
fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0]
# Guard against the case when an image has fewer than fg_rois_per_image
# foreground RoIs
fg_rois_per_this_image = np.minimum(fg_rois_per_image, fg_inds.size)
# Sample foreground regions without replacement
if fg_inds.size > 0:
fg_inds = npr.choice(
fg_inds, size=fg_rois_per_this_image, replace=False
)
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
bg_inds = np.where(
(max_overlaps < cfg.TRAIN.BG_THRESH_HI) &
(max_overlaps >= cfg.TRAIN.BG_THRESH_LO)
)[0]
# Compute number of background RoIs to take from this image (guarding
# against there being fewer than desired)
bg_rois_per_this_image = rois_per_image - fg_rois_per_this_image
bg_rois_per_this_image = np.minimum(bg_rois_per_this_image, bg_inds.size)
# Sample foreground regions without replacement
if bg_inds.size > 0:
bg_inds = npr.choice(
bg_inds, size=bg_rois_per_this_image, replace=False
)
# The indices that we're selecting (both fg and bg)
keep_inds = np.append(fg_inds, bg_inds)
# Label is the class each RoI has max overlap with
sampled_labels = roidb['max_classes'][keep_inds]
sampled_labels[fg_rois_per_this_image:] = 0 # Label bg RoIs with class 0
sampled_boxes = roidb['boxes'][keep_inds]
if 'bbox_targets' not in roidb:
gt_inds = np.where(roidb['gt_classes'] > 0)[0]
gt_boxes = roidb['boxes'][gt_inds, :]
gt_assignments = gt_inds[roidb['box_to_gt_ind_map'][keep_inds]]
bbox_targets = _compute_targets(
sampled_boxes, gt_boxes[gt_assignments, :], sampled_labels
)
bbox_targets, bbox_inside_weights = _expand_bbox_targets(bbox_targets)
else:
bbox_targets, bbox_inside_weights = _expand_bbox_targets(
roidb['bbox_targets'][keep_inds, :]
)
bbox_outside_weights = np.array(
bbox_inside_weights > 0, dtype=bbox_inside_weights.dtype
)
# Scale rois and format as (batch_idx, x1, y1, x2, y2)
sampled_rois = sampled_boxes * im_scale
repeated_batch_idx = batch_idx * blob_utils.ones((sampled_rois.shape[0], 1))
sampled_rois = np.hstack((repeated_batch_idx, sampled_rois))
# Base Fast R-CNN blobs
blob_dict = dict(
labels_int32=sampled_labels.astype(np.int32, copy=False),
rois=sampled_rois,
bbox_targets=bbox_targets,
bbox_inside_weights=bbox_inside_weights,
bbox_outside_weights=bbox_outside_weights
)
# Optionally add Mask R-CNN blobs
if cfg.MODEL.MASK_ON:
roi_data.mask_rcnn.add_mask_rcnn_blobs(
blob_dict, sampled_boxes, roidb, im_scale, batch_idx
)
# Optionally add Keypoint R-CNN blobs
if cfg.MODEL.KEYPOINTS_ON:
roi_data.keypoint_rcnn.add_keypoint_rcnn_blobs(
blob_dict, roidb, fg_rois_per_image, fg_inds, im_scale, batch_idx
)
return blob_dict
def _compute_targets(ex_rois, gt_rois, labels):
"""Compute bounding-box regression targets for an image."""
assert ex_rois.shape[0] == gt_rois.shape[0]
assert ex_rois.shape[1] == 4
assert gt_rois.shape[1] == 4
targets = box_utils.bbox_transform_inv(
ex_rois, gt_rois, cfg.MODEL.BBOX_REG_WEIGHTS
)
return np.hstack((labels[:, np.newaxis], targets)).astype(
np.float32, copy=False
)
def _expand_bbox_targets(bbox_target_data):
"""Bounding-box regression targets are stored in a compact form in the
roidb.
This function expands those targets into the 4-of-4*K representation used
by the network (i.e. only one class has non-zero targets). The loss weights
are similarly expanded.
Returns:
bbox_target_data (ndarray): N x 4K blob of regression targets
bbox_inside_weights (ndarray): N x 4K blob of loss weights
"""
num_bbox_reg_classes = cfg.MODEL.NUM_CLASSES
if cfg.MODEL.CLS_AGNOSTIC_BBOX_REG:
num_bbox_reg_classes = 2 # bg and fg
clss = bbox_target_data[:, 0]
bbox_targets = blob_utils.zeros((clss.size, 4 * num_bbox_reg_classes))
bbox_inside_weights = blob_utils.zeros(bbox_targets.shape)
inds = np.where(clss > 0)[0]
for ind in inds:
cls = int(clss[ind])
start = 4 * cls
end = start + 4
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
bbox_inside_weights[ind, start:end] = (1.0, 1.0, 1.0, 1.0)
return bbox_targets, bbox_inside_weights
def _add_multilevel_rois(blobs):
"""By default training RoIs are added for a single feature map level only.
When using FPN, the RoIs must be distributed over different FPN levels
according the level assignment heuristic (see: modeling.FPN.
map_rois_to_fpn_levels).
"""
lvl_min = cfg.FPN.ROI_MIN_LEVEL
lvl_max = cfg.FPN.ROI_MAX_LEVEL
def _distribute_rois_over_fpn_levels(rois_blob_name):
"""Distribute rois over the different FPN levels."""
# Get target level for each roi
# Recall blob rois are in (batch_idx, x1, y1, x2, y2) format, hence take
# the box coordinates from columns 1:5
target_lvls = fpn.map_rois_to_fpn_levels(
blobs[rois_blob_name][:, 1:5], lvl_min, lvl_max
)
# Add per FPN level roi blobs named like: <rois_blob_name>_fpn<lvl>
fpn.add_multilevel_roi_blobs(
blobs, rois_blob_name, blobs[rois_blob_name], target_lvls, lvl_min,
lvl_max
)
_distribute_rois_over_fpn_levels('rois')
if cfg.MODEL.MASK_ON:
_distribute_rois_over_fpn_levels('mask_rois')
if cfg.MODEL.KEYPOINTS_ON:
_distribute_rois_over_fpn_levels('keypoint_rois')
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Construct minibatches for Mask R-CNN training when keypoints are enabled.
Handles the minibatch blobs that are specific to training Mask R-CNN for
keypoint detection. Other blobs that are generic to RPN or Fast/er R-CNN are
handled by their respecitive roi_data modules.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import logging
import numpy as np
from core.config import cfg
import utils.blob as blob_utils
import utils.keypoints as keypoint_utils
logger = logging.getLogger(__name__)
def add_keypoint_rcnn_blobs(
blobs, roidb, fg_rois_per_image, fg_inds, im_scale, batch_idx
):
"""Add Mask R-CNN keypoint specific blobs to the given blobs dictionary."""
# Note: gt_inds must match how they're computed in
# datasets.json_dataset._merge_proposal_boxes_into_roidb
gt_inds = np.where(roidb['gt_classes'] > 0)[0]
max_overlaps = roidb['max_overlaps']
gt_keypoints = roidb['gt_keypoints']
ind_kp = gt_inds[roidb['box_to_gt_ind_map']]
within_box = _within_box(gt_keypoints[ind_kp, :, :], roidb['boxes'])
vis_kp = gt_keypoints[ind_kp, 2, :] > 0
is_visible = np.sum(np.logical_and(vis_kp, within_box), axis=1) > 0
kp_fg_inds = np.where(
np.logical_and(max_overlaps >= cfg.TRAIN.FG_THRESH, is_visible)
)[0]
kp_fg_rois_per_this_image = np.minimum(fg_rois_per_image, kp_fg_inds.size)
if kp_fg_inds.size > kp_fg_rois_per_this_image:
kp_fg_inds = np.random.choice(
kp_fg_inds, size=kp_fg_rois_per_this_image, replace=False
)
sampled_fg_rois = roidb['boxes'][kp_fg_inds]
box_to_gt_ind_map = roidb['box_to_gt_ind_map'][kp_fg_inds]
num_keypoints = gt_keypoints.shape[2]
sampled_keypoints = -np.ones(
(len(sampled_fg_rois), gt_keypoints.shape[1], num_keypoints),
dtype=gt_keypoints.dtype
)
for ii in range(len(sampled_fg_rois)):
ind = box_to_gt_ind_map[ii]
if ind >= 0:
sampled_keypoints[ii, :, :] = gt_keypoints[gt_inds[ind], :, :]
assert np.sum(sampled_keypoints[ii, 2, :]) > 0
heats, weights = keypoint_utils.keypoints_to_heatmap_labels(
sampled_keypoints, sampled_fg_rois
)
shape = (sampled_fg_rois.shape[0] * cfg.KRCNN.NUM_KEYPOINTS, 1)
heats = heats.reshape(shape)
weights = weights.reshape(shape)
sampled_fg_rois *= im_scale
repeated_batch_idx = batch_idx * blob_utils.ones(
(sampled_fg_rois.shape[0], 1)
)
sampled_fg_rois = np.hstack((repeated_batch_idx, sampled_fg_rois))
blobs['keypoint_rois'] = sampled_fg_rois
blobs['keypoint_locations_int32'] = heats.astype(np.int32, copy=False)
blobs['keypoint_weights'] = weights
def finalize_keypoint_minibatch(blobs, valid):
"""Finalize the minibatch after blobs for all minibatch images have been
collated.
"""
min_count = cfg.KRCNN.MIN_KEYPOINT_COUNT_FOR_VALID_MINIBATCH
num_visible_keypoints = np.sum(blobs['keypoint_weights'])
valid = (
valid and len(blobs['keypoint_weights']) > 0 and
num_visible_keypoints > min_count
)
# Normalizer to use if cfg.KRCNN.NORMALIZE_BY_VISIBLE_KEYPOINTS is False.
# See modeling.model_builder.add_keypoint_losses
norm = num_visible_keypoints / (
cfg.TRAIN.IMS_PER_BATCH * cfg.TRAIN.BATCH_SIZE_PER_IM *
cfg.TRAIN.FG_FRACTION * cfg.KRCNN.NUM_KEYPOINTS
)
blobs['keypoint_loss_normalizer'] = np.array(norm, dtype=np.float32)
return valid
def _within_box(points, boxes):
"""Validate which keypoints are contained inside a given box.
points: Nx2xK
boxes: Nx4
output: NxK
"""
x_within = np.logical_and(
points[:, 0, :] >= np.expand_dims(boxes[:, 0], axis=1),
points[:, 0, :] <= np.expand_dims(boxes[:, 2], axis=1)
)
y_within = np.logical_and(
points[:, 1, :] >= np.expand_dims(boxes[:, 1], axis=1),
points[:, 1, :] <= np.expand_dims(boxes[:, 3], axis=1)
)
return np.logical_and(x_within, y_within)
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Detectron data loader. The design is generic and abstracted away from any
details of the minibatch. A minibatch is a dictionary of blob name keys and
their associated numpy (float32 or int32) ndarray values.
Outline of the data loader design:
loader thread\
loader thread \ / GPU 1 enqueue thread -> feed -> EnqueueOp
... -> minibatch queue -> ...
loader thread / \ GPU N enqueue thread -> feed -> EnqueueOp
loader thread/
<---------------------------- CPU -----------------------------|---- GPU ---->
A pool of loader threads construct minibatches that are put onto the shared
minibatch queue. Each GPU has an enqueue thread that pulls a minibatch off the
minibatch queue, feeds the minibatch blobs into the workspace, and then runs
an EnqueueBlobsOp to place the minibatch blobs into the GPU's blobs queue.
During each fprop the first thing the network does is run a DequeueBlobsOp
in order to populate the workspace with the blobs from a queued minibatch.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from collections import deque
from collections import OrderedDict
import logging
import numpy as np
import Queue
import signal
import threading
import time
import uuid
from caffe2.python import core, workspace
from core.config import cfg
from roi_data.minibatch import get_minibatch
from roi_data.minibatch import get_minibatch_blob_names
from utils.coordinator import coordinated_get
from utils.coordinator import coordinated_put
from utils.coordinator import Coordinator
import utils.c2 as c2_utils
logger = logging.getLogger(__name__)
class RoIDataLoader(object):
def __init__(
self,
roidb,
num_loaders=4,
minibatch_queue_size=64,
blobs_queue_capacity=8
):
self._roidb = roidb
self._lock = threading.Lock()
self._perm = deque(range(len(self._roidb)))
self._cur = 0 # _perm cursor
# The minibatch queue holds prepared training data in host (CPU) memory
# When training with N > 1 GPUs, each element in the minibatch queue
# is actually a partial minibatch which contributes 1 / N of the
# examples to the overall minibatch
self._minibatch_queue = Queue.Queue(maxsize=minibatch_queue_size)
self._blobs_queue_capacity = blobs_queue_capacity
# Random queue name in case one instantiates multple RoIDataLoaders
self._loader_id = uuid.uuid4()
self._blobs_queue_name = 'roi_blobs_queue_{}'.format(self._loader_id)
# Loader threads construct (partial) minibatches and put them on the
# minibatch queue
self._num_loaders = num_loaders
self._num_gpus = cfg.NUM_GPUS
self.coordinator = Coordinator()
self._output_names = get_minibatch_blob_names()
self._shuffle_roidb_inds()
self.create_threads()
def minibatch_loader_thread(self):
"""Load mini-batches and put them onto the mini-batch queue."""
with self.coordinator.stop_on_exception():
while not self.coordinator.should_stop():
blobs = self.get_next_minibatch()
# Blobs must be queued in the order specified by
# self.get_output_names
ordered_blobs = OrderedDict()
for key in self.get_output_names():
assert blobs[key].dtype in (np.int32, np.float32), \
'Blob {} of dtype {} must have dtype of ' \
'np.int32 or np.float32'.format(key, blobs[key].dtype)
ordered_blobs[key] = blobs[key]
coordinated_put(
self.coordinator, self._minibatch_queue, ordered_blobs
)
logger.info('Stopping mini-batch loading thread')
def enqueue_blobs_thread(self, gpu_id, blob_names):
"""Transfer mini-batches from a mini-batch queue to a BlobsQueue."""
with self.coordinator.stop_on_exception():
while not self.coordinator.should_stop():
if self._minibatch_queue.qsize == 0:
logger.warning('Mini-batch queue is empty')
blobs = coordinated_get(self.coordinator, self._minibatch_queue)
self.enqueue_blobs(gpu_id, blob_names, blobs.values())
logger.debug(
'batch queue size {}'.format(self._minibatch_queue.qsize())
)
logger.info('Stopping enqueue thread')
def get_next_minibatch(self):
"""Return the blobs to be used for the next minibatch. Thread safe."""
valid = False
while not valid:
db_inds = self._get_next_minibatch_inds()
minibatch_db = [self._roidb[i] for i in db_inds]
blobs, valid = get_minibatch(minibatch_db)
return blobs
def _shuffle_roidb_inds(self):
"""Randomly permute the training roidb. Not thread safe."""
if cfg.TRAIN.ASPECT_GROUPING:
widths = np.array([r['width'] for r in self._roidb])
heights = np.array([r['height'] for r in self._roidb])
horz = (widths >= heights)
vert = np.logical_not(horz)
horz_inds = np.where(horz)[0]
vert_inds = np.where(vert)[0]
inds = np.hstack(
(
np.random.permutation(horz_inds),
np.random.permutation(vert_inds)
)
)
inds = np.reshape(inds, (-1, 2))
row_perm = np.random.permutation(np.arange(inds.shape[0]))
inds = np.reshape(inds[row_perm, :], (-1, ))
self._perm = inds
else:
self._perm = np.random.permutation(np.arange(len(self._roidb)))
self._perm = deque(self._perm)
self._cur = 0
def _get_next_minibatch_inds(self):
"""Return the roidb indices for the next minibatch. Thread safe."""
with self._lock:
# We use a deque and always take the *first* IMS_PER_BATCH items
# followed by *rotating* the deque so that we see fresh items
# each time. If the length of _perm is not divisible by
# IMS_PER_BATCH, then we end up wrapping around the permutation.
db_inds = [self._perm[i] for i in range(cfg.TRAIN.IMS_PER_BATCH)]
self._perm.rotate(-cfg.TRAIN.IMS_PER_BATCH)
self._cur += cfg.TRAIN.IMS_PER_BATCH
if self._cur >= len(self._perm):
self._shuffle_roidb_inds()
return db_inds
def get_output_names(self):
return self._output_names
def enqueue_blobs(self, gpu_id, blob_names, blobs):
"""Put a mini-batch on a BlobsQueue."""
assert len(blob_names) == len(blobs)
t = time.time()
dev = c2_utils.CudaDevice(gpu_id)
queue_name = 'gpu_{}/{}'.format(gpu_id, self._blobs_queue_name)
blob_names = ['gpu_{}/{}'.format(gpu_id, b) for b in blob_names]
for (blob_name, blob) in zip(blob_names, blobs):
workspace.FeedBlob(blob_name, blob, device_option=dev)
logger.debug(
'enqueue_blobs {}: workspace.FeedBlob: {}'.
format(gpu_id, time.time() - t)
)
t = time.time()
op = core.CreateOperator(
'SafeEnqueueBlobs', [queue_name] + blob_names,
blob_names + [queue_name + '_enqueue_status'],
device_option=dev
)
workspace.RunOperatorOnce(op)
logger.debug(
'enqueue_blobs {}: workspace.RunOperatorOnce: {}'.
format(gpu_id, time.time() - t)
)
def create_threads(self):
# Create mini-batch loader threads, each of which builds mini-batches
# and places them into a queue in CPU memory
self._workers = [
threading.Thread(target=self.minibatch_loader_thread)
for _ in range(self._num_loaders)
]
# Create one BlobsQueue per GPU
# (enqueue_blob_names are unscoped)
enqueue_blob_names = self.create_blobs_queues()
# Create one enqueuer thread per GPU
self._enqueuers = [
threading.Thread(
target=self.enqueue_blobs_thread,
args=(gpu_id, enqueue_blob_names)
) for gpu_id in range(self._num_gpus)
]
def start(self, prefill=False):
for w in self._workers + self._enqueuers:
w.start()
if prefill:
logger.info('Pre-filling mini-batch queue...')
while not self._minibatch_queue.full():
logger.info(
' [{:d}/{:d}]'.format(
self._minibatch_queue.qsize(),
self._minibatch_queue.maxsize
)
)
time.sleep(0.1)
# Detect failure and shutdown
if self.coordinator.should_stop():
self.shutdown()
break
def shutdown(self):
self.coordinator.request_stop()
self.coordinator.wait_for_stop()
self.close_blobs_queues()
for w in self._workers + self._enqueuers:
w.join()
def create_blobs_queues(self):
"""Create one BlobsQueue for each GPU to hold mini-batches."""
for gpu_id in range(self._num_gpus):
with c2_utils.GpuNameScope(gpu_id):
workspace.RunOperatorOnce(
core.CreateOperator(
'CreateBlobsQueue', [], [self._blobs_queue_name],
num_blobs=len(self.get_output_names()),
capacity=self._blobs_queue_capacity
)
)
return self.create_enqueue_blobs()
def close_blobs_queues(self):
"""Close a BlobsQueue."""
for gpu_id in range(self._num_gpus):
with core.NameScope('gpu_{}'.format(gpu_id)):
workspace.RunOperatorOnce(
core.CreateOperator(
'CloseBlobsQueue', [self._blobs_queue_name], []
)
)
def create_enqueue_blobs(self):
blob_names = self.get_output_names()
enqueue_blob_names = [
'{}_enqueue_{}'.format(b, self._loader_id) for b in blob_names
]
for gpu_id in range(self._num_gpus):
with c2_utils.NamedCudaScope(gpu_id):
for blob in enqueue_blob_names:
workspace.CreateBlob(core.ScopedName(blob))
return enqueue_blob_names
def register_sigint_handler(self):
def signal_handler(signal, frame):
logger.info(
'SIGINT: Shutting down RoIDataLoader threads and exiting...'
)
self.shutdown()
signal.signal(signal.SIGINT, signal_handler)
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Construct minibatches for Mask R-CNN training. Handles the minibatch blobs
that are specific to Mask R-CNN. Other blobs that are generic to RPN or
Fast/er R-CNN are handled by their respecitive roi_data modules.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import logging
import numpy as np
from core.config import cfg
import utils.blob as blob_utils
import utils.boxes as box_utils
import utils.segms as segm_utils
logger = logging.getLogger(__name__)
def add_mask_rcnn_blobs(blobs, sampled_boxes, roidb, im_scale, batch_idx):
"""Add Mask R-CNN specific blobs to the input blob dictionary."""
# Prepare the mask targets by associating one gt mask to each training roi
# that has a fg (non-bg) class label.
M = cfg.MRCNN.RESOLUTION
polys_gt_inds = np.where(
(roidb['gt_classes'] > 0) & (roidb['is_crowd'] == 0)
)[0]
polys_gt = [roidb['segms'][i] for i in polys_gt_inds]
boxes_from_polys = segm_utils.polys_to_boxes(polys_gt)
fg_inds = np.where(blobs['labels_int32'] > 0)[0]
roi_has_mask = blobs['labels_int32'].copy()
roi_has_mask[roi_has_mask > 0] = 1
if fg_inds.shape[0] > 0:
# Class labels for the foreground rois
mask_class_labels = blobs['labels_int32'][fg_inds]
masks = blob_utils.zeros((fg_inds.shape[0], M**2), int32=True)
# Find overlap between all foreground rois and the bounding boxes
# enclosing each segmentation
rois_fg = sampled_boxes[fg_inds]
overlaps_bbfg_bbpolys = box_utils.bbox_overlaps(
rois_fg.astype(np.float32, copy=False),
boxes_from_polys.astype(np.float32, copy=False)
)
# Map from each fg rois to the index of the mask with highest overlap
# (measured by bbox overlap)
fg_polys_inds = np.argmax(overlaps_bbfg_bbpolys, axis=1)
# add fg targets
for i in range(rois_fg.shape[0]):
fg_polys_ind = fg_polys_inds[i]
poly_gt = polys_gt[fg_polys_ind]
roi_fg = rois_fg[i]
# Rasterize the portion of the polygon mask within the given fg roi
# to an M x M binary image
mask = segm_utils.polys_to_mask_wrt_box(poly_gt, roi_fg, M)
mask = np.array(mask > 0, dtype=np.int32) # Ensure it's binary
masks[i, :] = np.reshape(mask, M**2)
else: # If there are no fg masks (it does happen)
# The network cannot handle empty blobs, so we must provide a mask
# We simply take the first bg roi, given it an all -1's mask (ignore
# label), and label it with class zero (bg).
bg_inds = np.where(blobs['labels_int32'] == 0)[0]
# rois_fg is actually one background roi, but that's ok because ...
rois_fg = sampled_boxes[bg_inds[0]].reshape((1, -1))
# We give it an -1's blob (ignore label)
masks = -blob_utils.ones((1, M**2), int32=True)
# We label it with class = 0 (background)
mask_class_labels = blob_utils.zeros((1, ))
# Mark that the first roi has a mask
roi_has_mask[0] = 1
if cfg.MRCNN.CLS_SPECIFIC_MASK:
masks = _expand_to_class_specific_mask_targets(masks, mask_class_labels)
# Scale rois_fg and format as (batch_idx, x1, y1, x2, y2)
rois_fg *= im_scale
repeated_batch_idx = batch_idx * blob_utils.ones((rois_fg.shape[0], 1))
rois_fg = np.hstack((repeated_batch_idx, rois_fg))
# Update blobs dict with Mask R-CNN blobs
blobs['mask_rois'] = rois_fg
blobs['roi_has_mask_int32'] = roi_has_mask
blobs['masks_int32'] = masks
def _expand_to_class_specific_mask_targets(masks, mask_class_labels):
"""Expand masks from shape (#masks, M ** 2) to (#masks, #classes * M ** 2)
to encode class specific mask targets.
"""
assert masks.shape[0] == mask_class_labels.shape[0]
M = cfg.MRCNN.RESOLUTION
# Target values of -1 are "don't care" / ignore labels
mask_targets = -blob_utils.ones(
(masks.shape[0], cfg.MODEL.NUM_CLASSES * M**2), int32=True
)
for i in range(masks.shape[0]):
cls = int(mask_class_labels[i])
start = M**2 * cls
end = start + M**2
# Ignore background instance
# (only happens when there is no fg samples in an image)
if cls > 0:
mask_targets[i, start:end] = masks[i, :]
return mask_targets
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
#
# Based on:
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
"""Construct minibatches for Detectron networks."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import cv2
import logging
import numpy as np
from core.config import cfg
import roi_data.fast_rcnn
import roi_data.retinanet
import roi_data.rpn
import utils.blob as blob_utils
logger = logging.getLogger(__name__)
def get_minibatch_blob_names(is_training=True):
"""Return blob names in the order in which they are read by the data loader.
"""
# data blob: holds a batch of N images, each with 3 channels
blob_names = ['data']
if cfg.RPN.RPN_ON:
# RPN-only or end-to-end Faster R-CNN
blob_names += roi_data.rpn.get_rpn_blob_names(is_training=is_training)
elif cfg.RETINANET.RETINANET_ON:
blob_names += roi_data.retinanet.get_retinanet_blob_names(
is_training=is_training
)
else:
# Fast R-CNN like models trained on precomputed proposals
blob_names += roi_data.fast_rcnn.get_fast_rcnn_blob_names(
is_training=is_training
)
return blob_names
def get_minibatch(roidb):
"""Given a roidb, construct a minibatch sampled from it."""
# We collect blobs from each image onto a list and then concat them into a
# single tensor, hence we initialize each blob to an empty list
blobs = {k: [] for k in get_minibatch_blob_names()}
# Get the input image blob, formatted for caffe2
im_blob, im_scales = _get_image_blob(roidb)
blobs['data'] = im_blob
if cfg.RPN.RPN_ON:
# RPN-only or end-to-end Faster/Mask R-CNN
valid = roi_data.rpn.add_rpn_blobs(blobs, im_scales, roidb)
elif cfg.RETINANET.RETINANET_ON:
im_width, im_height = im_blob.shape[3], im_blob.shape[2]
# im_width, im_height corresponds to the network input: padded image
# (if needed) width and height. We pass it as input and slice the data
# accordingly so that we don't need to use SampleAsOp
valid = roi_data.retinanet.add_retinanet_blobs(
blobs, im_scales, roidb, im_width, im_height
)
else:
# Fast R-CNN like models trained on precomputed proposals
valid = roi_data.fast_rcnn.add_fast_rcnn_blobs(blobs, im_scales, roidb)
return blobs, valid
def _get_image_blob(roidb):
"""Builds an input blob from the images in the roidb at the specified
scales.
"""
num_images = len(roidb)
# Sample random scales to use for each image in this batch
scale_inds = np.random.randint(
0, high=len(cfg.TRAIN.SCALES), size=num_images
)
processed_ims = []
im_scales = []
for i in range(num_images):
im = cv2.imread(roidb[i]['image'])
if roidb[i]['flipped']:
im = im[:, ::-1, :]
target_size = cfg.TRAIN.SCALES[scale_inds[i]]
im, im_scale = blob_utils.prep_im_for_blob(
im, cfg.PIXEL_MEANS, [target_size], cfg.TRAIN.MAX_SIZE
)
im_scales.append(im_scale[0])
processed_ims.append(im[0])
# Create a blob to hold the input images
blob = blob_utils.im_list_to_blob(processed_ims)
return blob, im_scales
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Compute minibatch blobs for training a RetinaNet network."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
import logging
import utils.boxes as box_utils
import roi_data.data_utils as data_utils
from core.config import cfg
logger = logging.getLogger(__name__)
def get_retinanet_blob_names(is_training=True):
"""
Returns blob names in the order in which they are read by the data
loader.
N = number of images per minibatch
A = number of anchors = num_scales * num_aspect_ratios
(for example 9 used in RetinaNet paper)
H, W = spatial dimensions (different for each FPN level)
M = Out of all the anchors generated, depending on the positive/negative IoU
overlap thresholds, we will have M positive anchors. These are the anchors
that bounding box branch will regress on.
retnet_cls_labels -> labels for the cls branch for each FPN level
Shape: N x A x H x W
retnet_roi_bbox_targets -> targets for the bbox regression branch
Shape: M x 4
retnet_roi_fg_bbox_locs -> for the bbox regression, since we are only
interested in regressing on fg bboxes which are
M in number and the output prediction of the network
is of shape N x (A * 4) x H x W
(in case of non class-specific bbox), so we
store the locations of positive fg boxes in this
blob retnet_roi_fg_bbox_locs of shape M x 4 where
each row looks like: [img_id, anchor_id, x_loc, y_loc]
"""
# im_info: (height, width, image scale)
blob_names = ['im_info']
assert cfg.FPN.FPN_ON, "RetinaNet uses FPN for dense detection"
# Same format as RPN blobs, but one per FPN level
if is_training:
blob_names += ['retnet_fg_num', 'retnet_bg_num']
for lvl in range(cfg.FPN.RPN_MIN_LEVEL, cfg.FPN.RPN_MAX_LEVEL + 1):
suffix = 'fpn{}'.format(lvl)
blob_names += [
'retnet_cls_labels_' + suffix,
'retnet_roi_bbox_targets_' + suffix,
'retnet_roi_fg_bbox_locs_' + suffix,
]
return blob_names
def add_retinanet_blobs(blobs, im_scales, roidb, image_width, image_height):
"""Add RetinaNet blobs."""
# RetinaNet is applied to many feature levels, as in the FPN paper
k_max, k_min = cfg.FPN.RPN_MAX_LEVEL, cfg.FPN.RPN_MIN_LEVEL
scales_per_octave = cfg.RETINANET.SCALES_PER_OCTAVE
num_aspect_ratios = len(cfg.RETINANET.ASPECT_RATIOS)
aspect_ratios = cfg.RETINANET.ASPECT_RATIOS
anchor_scale = cfg.RETINANET.ANCHOR_SCALE
# get anchors from all levels for all scales/aspect ratios
foas = []
for lvl in range(k_min, k_max + 1):
stride = 2. ** lvl
for octave in range(scales_per_octave):
octave_scale = 2 ** (octave / float(scales_per_octave))
for idx in range(num_aspect_ratios):
anchor_sizes = (stride * octave_scale * anchor_scale, )
anchor_aspect_ratios = (aspect_ratios[idx], )
foa = data_utils.get_field_of_anchors(
stride, anchor_sizes, anchor_aspect_ratios, octave, idx)
foas.append(foa)
all_anchors = np.concatenate([f.field_of_anchors for f in foas])
blobs['retnet_fg_num'], blobs['retnet_bg_num'] = 0.0, 0.0
for im_i, entry in enumerate(roidb):
scale = im_scales[im_i]
im_height = np.round(entry['height'] * scale)
im_width = np.round(entry['width'] * scale)
gt_inds = np.where(
(entry['gt_classes'] > 0) & (entry['is_crowd'] == 0))[0]
assert len(gt_inds) > 0, \
'Empty ground truth empty for image is not allowed. Please check.'
gt_rois = entry['boxes'][gt_inds, :] * scale
gt_classes = entry['gt_classes'][gt_inds]
im_info = np.array([[im_height, im_width, scale]], dtype=np.float32)
blobs['im_info'].append(im_info)
retinanet_blobs, fg_num, bg_num = _get_retinanet_blobs(
foas, all_anchors, gt_rois, gt_classes, image_width, image_height)
for i, foa in enumerate(foas):
for k, v in retinanet_blobs[i].items():
# the way it stacks is:
# [[anchors for image1] + [anchors for images 2]]
level = int(np.log2(foa.stride))
key = '{}_fpn{}'.format(k, level)
if k == 'retnet_roi_fg_bbox_locs':
v[:, 0] = im_i
# loc_stride: 80 * 4 if cls_specific else 4
loc_stride = 4 # 4 coordinate corresponding to bbox prediction
if cfg.RETINANET.CLASS_SPECIFIC_BBOX:
loc_stride *= (cfg.MODEL.NUM_CLASSES - 1)
anchor_ind = foa.octave * num_aspect_ratios + foa.aspect
# v[:, 1] is the class label [range 0-80] if we do
# class-specfic bbox otherwise it is 0. In case of class
# specific, based on the label, the location of current
# anchor is class_label * 4 and then we take into account
# the anchor_ind if the anchors
v[:, 1] *= 4
v[:, 1] += loc_stride * anchor_ind
blobs[key].append(v)
blobs['retnet_fg_num'] += fg_num
blobs['retnet_bg_num'] += bg_num
blobs['retnet_fg_num'] = blobs['retnet_fg_num'].astype(np.float32)
blobs['retnet_bg_num'] = blobs['retnet_bg_num'].astype(np.float32)
N = len(roidb)
for k, v in blobs.items():
if isinstance(v, list) and len(v) > 0:
# compute number of anchors
A = int(len(v) / N)
# for the cls branch labels [per fpn level],
# we have blobs['retnet_cls_labels_fpn{}'] as a list until this step
# and length of this list is N x A where
# N = num_images, A = num_anchors for example, N = 2, A = 9
# Each element of the list has the shape 1 x 1 x H x W where H, W are
# spatial dimension of curret fpn lvl. Let a{i} denote the element
# corresponding to anchor i [9 anchors total] in the list.
# The elements in the list are in order [[a0, ..., a9], [a0, ..., a9]]
# however the network will make predictions like 2 x (9 * 80) x H x W
# so we first concatenate the elements of each image to a numpy array
# and then concatenate the two images to get the 2 x 9 x H x W
if k.find('retnet_cls_labels') >= 0:
tmp = []
# concat anchors within an image
for i in range(0, len(v), A):
tmp.append(np.concatenate(v[i: i + A], axis=1))
# concat images
blobs[k] = np.concatenate(tmp, axis=0)
else:
# for the bbox branch elements [per FPN level],
# we have the targets and the fg boxes locations
# in the shape: M x 4 where M is the number of fg locations in a
# given image at the current FPN level. For the given level,
# the bbox predictions will be. The elements in the list are in
# order [[a0, ..., a9], [a0, ..., a9]]
# Concatenate them to form M x 4
blobs[k] = np.concatenate(v, axis=0)
return True
def _get_retinanet_blobs(
foas, all_anchors, gt_boxes, gt_classes, im_width, im_height):
total_anchors = all_anchors.shape[0]
logger.debug('Getting mad blobs: im_height {} im_width: {}'.format(
im_height, im_width))
inds_inside = np.arange(all_anchors.shape[0])
anchors = all_anchors
num_inside = len(inds_inside)
logger.debug('total_anchors: {}'.format(total_anchors))
logger.debug('inds_inside: {}'.format(num_inside))
logger.debug('anchors.shape: {}'.format(anchors.shape))
# Compute anchor labels:
# label=1 is positive, 0 is negative, -1 is don't care (ignore)
labels = np.empty((num_inside, ), dtype=np.float32)
labels.fill(-1)
if len(gt_boxes) > 0:
# Compute overlaps between the anchors and the gt boxes overlaps
anchor_by_gt_overlap = box_utils.bbox_overlaps(anchors, gt_boxes)
# Map from anchor to gt box that has highest overlap
anchor_to_gt_argmax = anchor_by_gt_overlap.argmax(axis=1)
# For each anchor, amount of overlap with most overlapping gt box
anchor_to_gt_max = anchor_by_gt_overlap[
np.arange(num_inside), anchor_to_gt_argmax]
# Map from gt box to an anchor that has highest overlap
gt_to_anchor_argmax = anchor_by_gt_overlap.argmax(axis=0)
# For each gt box, amount of overlap with most overlapping anchor
gt_to_anchor_max = anchor_by_gt_overlap[
gt_to_anchor_argmax, np.arange(anchor_by_gt_overlap.shape[1])]
# Find all anchors that share the max overlap amount
# (this includes many ties)
anchors_with_max_overlap = np.where(
anchor_by_gt_overlap == gt_to_anchor_max)[0]
# Fg label: for each gt use anchors with highest overlap
# (including ties)
gt_inds = anchor_to_gt_argmax[anchors_with_max_overlap]
labels[anchors_with_max_overlap] = gt_classes[gt_inds]
# Fg label: above threshold IOU
inds = anchor_to_gt_max >= cfg.RETINANET.POSITIVE_OVERLAP
gt_inds = anchor_to_gt_argmax[inds]
labels[inds] = gt_classes[gt_inds]
fg_inds = np.where(labels >= 1)[0]
bg_inds = np.where(anchor_to_gt_max < cfg.RETINANET.NEGATIVE_OVERLAP)[0]
labels[bg_inds] = 0
num_fg, num_bg = len(fg_inds), len(bg_inds)
bbox_targets = np.zeros((num_inside, 4), dtype=np.float32)
bbox_targets[fg_inds, :] = data_utils.compute_targets(
anchors[fg_inds, :], gt_boxes[anchor_to_gt_argmax[fg_inds], :])
# Map up to original set of anchors
labels = data_utils.unmap(labels, total_anchors, inds_inside, fill=-1)
bbox_targets = data_utils.unmap(bbox_targets, total_anchors, inds_inside, fill=0)
# Split the generated labels, etc. into labels per each field of anchors
blobs_out = []
start_idx = 0
for foa in foas:
H = foa.field_size
W = foa.field_size
end_idx = start_idx + H * W
_labels = labels[start_idx:end_idx]
_bbox_targets = bbox_targets[start_idx:end_idx, :]
start_idx = end_idx
# labels output with shape (1, height, width)
_labels = _labels.reshape((1, 1, H, W))
# bbox_targets output with shape (1, 4 * A, height, width)
_bbox_targets = _bbox_targets.reshape((1, H, W, 4)).transpose(0, 3, 1, 2)
stride = foa.stride
w = int(im_width / stride)
h = int(im_height / stride)
# data for select_smooth_l1 loss
num_classes = cfg.MODEL.NUM_CLASSES - 1
inds_4d = np.where(_labels > 0)
M = len(inds_4d)
_roi_bbox_targets = np.zeros((0, 4))
_roi_fg_bbox_locs = np.zeros((0, 4))
if M > 0:
im_inds, y, x = inds_4d[0], inds_4d[2], inds_4d[3]
_roi_bbox_targets = np.zeros((len(im_inds), 4))
_roi_fg_bbox_locs = np.zeros((len(im_inds), 4))
lbls = _labels[im_inds, :, y, x]
for i, lbl in enumerate(lbls):
l = lbl[0] - 1
if not cfg.RETINANET.CLASS_SPECIFIC_BBOX:
l = 0
assert l >= 0 and l < num_classes, 'label out of the range'
_roi_bbox_targets[i, :] = _bbox_targets[:, :, y[i], x[i]]
_roi_fg_bbox_locs[i, :] = np.array([[0, l, y[i], x[i]]])
blobs_out.append(
dict(
retnet_cls_labels=_labels[:, :, 0:h, 0:w].astype(np.int32),
retnet_roi_bbox_targets=_roi_bbox_targets.astype(np.float32),
retnet_roi_fg_bbox_locs=_roi_fg_bbox_locs.astype(np.float32),
))
out_num_fg = np.array([num_fg + 1.0], dtype=np.float32)
out_num_bg = (
np.array([num_bg + 1.0]) * (cfg.MODEL.NUM_CLASSES - 1) +
out_num_fg * (cfg.MODEL.NUM_CLASSES - 2))
return blobs_out, out_num_fg, out_num_bg
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Minibatch construction for Region Proposal Networks (RPN)."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import logging
import numpy as np
import numpy.random as npr
from core.config import cfg
import roi_data.data_utils as data_utils
import utils.blob as blob_utils
import utils.boxes as box_utils
logger = logging.getLogger(__name__)
def get_rpn_blob_names(is_training=True):
"""Blob names used by RPN."""
# im_info: (height, width, image scale)
blob_names = ['im_info']
if is_training:
# gt boxes: (batch_idx, x1, y1, x2, y2, cls)
blob_names += ['roidb']
if cfg.FPN.FPN_ON and cfg.FPN.MULTILEVEL_RPN:
# Same format as RPN blobs, but one per FPN level
for lvl in range(cfg.FPN.RPN_MIN_LEVEL, cfg.FPN.RPN_MAX_LEVEL + 1):
blob_names += [
'rpn_labels_int32_wide_fpn' + str(lvl),
'rpn_bbox_targets_wide_fpn' + str(lvl),
'rpn_bbox_inside_weights_wide_fpn' + str(lvl),
'rpn_bbox_outside_weights_wide_fpn' + str(lvl)
]
else:
# Single level RPN blobs
blob_names += [
'rpn_labels_int32_wide',
'rpn_bbox_targets_wide',
'rpn_bbox_inside_weights_wide',
'rpn_bbox_outside_weights_wide'
]
return blob_names
def add_rpn_blobs(blobs, im_scales, roidb):
"""Add blobs needed training RPN-only and end-to-end Faster R-CNN models."""
if cfg.FPN.FPN_ON and cfg.FPN.MULTILEVEL_RPN:
# RPN applied to many feature levels, as in the FPN paper
k_max = cfg.FPN.RPN_MAX_LEVEL
k_min = cfg.FPN.RPN_MIN_LEVEL
foas = []
for lvl in range(k_min, k_max + 1):
field_stride = 2.**lvl
anchor_sizes = (cfg.FPN.RPN_ANCHOR_START_SIZE * 2.**(lvl - k_min), )
anchor_aspect_ratios = cfg.FPN.RPN_ASPECT_RATIOS
foa = data_utils.get_field_of_anchors(
field_stride, anchor_sizes, anchor_aspect_ratios
)
foas.append(foa)
all_anchors = np.concatenate([f.field_of_anchors for f in foas])
else:
foa = data_utils.get_field_of_anchors(
cfg.RPN.STRIDE, cfg.RPN.SIZES, cfg.RPN.ASPECT_RATIOS
)
all_anchors = foa.field_of_anchors
for im_i, entry in enumerate(roidb):
scale = im_scales[im_i]
im_height = np.round(entry['height'] * scale)
im_width = np.round(entry['width'] * scale)
gt_inds = np.where(
(entry['gt_classes'] > 0) & (entry['is_crowd'] == 0)
)[0]
gt_rois = entry['boxes'][gt_inds, :] * scale
# TODO(rbg): gt_boxes is poorly named;
# should be something like 'gt_rois_info'
gt_boxes = blob_utils.zeros((len(gt_inds), 6))
gt_boxes[:, 0] = im_i # batch inds
gt_boxes[:, 1:5] = gt_rois
gt_boxes[:, 5] = entry['gt_classes'][gt_inds]
im_info = np.array([[im_height, im_width, scale]], dtype=np.float32)
blobs['im_info'].append(im_info)
# Add RPN targets
if cfg.FPN.FPN_ON and cfg.FPN.MULTILEVEL_RPN:
# RPN applied to many feature levels, as in the FPN paper
rpn_blobs = _get_rpn_blobs(
im_height, im_width, foas, all_anchors, gt_rois
)
for i, lvl in enumerate(range(k_min, k_max + 1)):
for k, v in rpn_blobs[i].items():
blobs[k + '_fpn' + str(lvl)].append(v)
else:
# Classical RPN, applied to a single feature level
rpn_blobs = _get_rpn_blobs(
im_height, im_width, [foa], all_anchors, gt_rois
)
for k, v in rpn_blobs.items():
blobs[k].append(v)
for k, v in blobs.items():
if isinstance(v, list) and len(v) > 0:
blobs[k] = np.concatenate(v)
valid_keys = [
'has_visible_keypoints', 'boxes', 'segms', 'seg_areas', 'gt_classes',
'gt_overlaps', 'is_crowd', 'box_to_gt_ind_map', 'gt_keypoints'
]
minimal_roidb = [{} for _ in range(len(roidb))]
for i, e in enumerate(roidb):
for k in valid_keys:
if k in e:
minimal_roidb[i][k] = e[k]
blobs['roidb'] = blob_utils.serialize(minimal_roidb)
# Always return valid=True, since RPN minibatches are valid by design
return True
def _get_rpn_blobs(im_height, im_width, foas, all_anchors, gt_boxes):
total_anchors = all_anchors.shape[0]
straddle_thresh = cfg.TRAIN.RPN_STRADDLE_THRESH
if straddle_thresh >= 0:
# Only keep anchors inside the image by a margin of straddle_thresh
# Set TRAIN.RPN_STRADDLE_THRESH to -1 (or a large value) to keep all
# anchors
inds_inside = np.where(
(all_anchors[:, 0] >= -straddle_thresh) &
(all_anchors[:, 1] >= -straddle_thresh) &
(all_anchors[:, 2] < im_width + straddle_thresh) &
(all_anchors[:, 3] < im_height + straddle_thresh)
)[0]
# keep only inside anchors
anchors = all_anchors[inds_inside, :]
else:
inds_inside = np.arange(all_anchors.shape[0])
anchors = all_anchors
num_inside = len(inds_inside)
logger.debug('total_anchors: {}'.format(total_anchors))
logger.debug('inds_inside: {}'.format(num_inside))
logger.debug('anchors.shape: {}'.format(anchors.shape))
# Compute anchor labels:
# label=1 is positive, 0 is negative, -1 is don't care (ignore)
labels = np.empty((num_inside, ), dtype=np.int32)
labels.fill(-1)
if len(gt_boxes) > 0:
# Compute overlaps between the anchors and the gt boxes overlaps
anchor_by_gt_overlap = box_utils.bbox_overlaps(anchors, gt_boxes)
# Map from anchor to gt box that has highest overlap
anchor_to_gt_argmax = anchor_by_gt_overlap.argmax(axis=1)
# For each anchor, amount of overlap with most overlapping gt box
anchor_to_gt_max = anchor_by_gt_overlap[np.arange(num_inside),
anchor_to_gt_argmax]
# Map from gt box to an anchor that has highest overlap
gt_to_anchor_argmax = anchor_by_gt_overlap.argmax(axis=0)
# For each gt box, amount of overlap with most overlapping anchor
gt_to_anchor_max = anchor_by_gt_overlap[
gt_to_anchor_argmax,
np.arange(anchor_by_gt_overlap.shape[1])
]
# Find all anchors that share the max overlap amount
# (this includes many ties)
anchors_with_max_overlap = np.where(
anchor_by_gt_overlap == gt_to_anchor_max
)[0]
# Fg label: for each gt use anchors with highest overlap
# (including ties)
labels[anchors_with_max_overlap] = 1
# Fg label: above threshold IOU
labels[anchor_to_gt_max >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1
# subsample positive labels if we have too many
num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCH_SIZE_PER_IM)
fg_inds = np.where(labels == 1)[0]
if len(fg_inds) > num_fg:
disable_inds = npr.choice(
fg_inds, size=(len(fg_inds) - num_fg), replace=False
)
labels[disable_inds] = -1
fg_inds = np.where(labels == 1)[0]
# subsample negative labels if we have too many
# (samples with replacement, but since the set of bg inds is large most
# samples will not have repeats)
num_bg = cfg.TRAIN.RPN_BATCH_SIZE_PER_IM - np.sum(labels == 1)
bg_inds = np.where(anchor_to_gt_max < cfg.TRAIN.RPN_NEGATIVE_OVERLAP)[0]
if len(bg_inds) > num_bg:
enable_inds = bg_inds[npr.randint(len(bg_inds), size=num_bg)]
labels[enable_inds] = 0
bg_inds = np.where(labels == 0)[0]
bbox_targets = np.zeros((num_inside, 4), dtype=np.float32)
bbox_targets[fg_inds, :] = data_utils.compute_targets(
anchors[fg_inds, :], gt_boxes[anchor_to_gt_argmax[fg_inds], :]
)
# Bbox regression loss has the form:
# loss(x) = weight_outside * L(weight_inside * x)
# Inside weights allow us to set zero loss on an element-wise basis
# Bbox regression is only trained on positive examples so we set their
# weights to 1.0 (or otherwise if config is different) and 0 otherwise
bbox_inside_weights = np.zeros((num_inside, 4), dtype=np.float32)
bbox_inside_weights[labels == 1, :] = (1.0, 1.0, 1.0, 1.0)
# The bbox regression loss only averages by the number of images in the
# mini-batch, whereas we need to average by the total number of example
# anchors selected
# Outside weights are used to scale each element-wise loss so the final
# average over the mini-batch is correct
bbox_outside_weights = np.zeros((num_inside, 4), dtype=np.float32)
# uniform weighting of examples (given non-uniform sampling)
num_examples = np.sum(labels >= 0)
bbox_outside_weights[labels == 1, :] = 1.0 / num_examples
bbox_outside_weights[labels == 0, :] = 1.0 / num_examples
# Map up to original set of anchors
labels = data_utils.unmap(labels, total_anchors, inds_inside, fill=-1)
bbox_targets = data_utils.unmap(
bbox_targets, total_anchors, inds_inside, fill=0
)
bbox_inside_weights = data_utils.unmap(
bbox_inside_weights, total_anchors, inds_inside, fill=0
)
bbox_outside_weights = data_utils.unmap(
bbox_outside_weights, total_anchors, inds_inside, fill=0
)
# Split the generated labels, etc. into labels per each field of anchors
blobs_out = []
start_idx = 0
for foa in foas:
H = foa.field_size
W = foa.field_size
A = foa.num_cell_anchors
end_idx = start_idx + H * W * A
_labels = labels[start_idx:end_idx]
_bbox_targets = bbox_targets[start_idx:end_idx, :]
_bbox_inside_weights = bbox_inside_weights[start_idx:end_idx, :]
_bbox_outside_weights = bbox_outside_weights[start_idx:end_idx, :]
start_idx = end_idx
# labels output with shape (1, A, height, width)
_labels = _labels.reshape((1, H, W, A)).transpose(0, 3, 1, 2)
# bbox_targets output with shape (1, 4 * A, height, width)
_bbox_targets = _bbox_targets.reshape(
(1, H, W, A * 4)).transpose(0, 3, 1, 2)
# bbox_inside_weights output with shape (1, 4 * A, height, width)
_bbox_inside_weights = _bbox_inside_weights.reshape(
(1, H, W, A * 4)).transpose(0, 3, 1, 2)
# bbox_outside_weights output with shape (1, 4 * A, height, width)
_bbox_outside_weights = _bbox_outside_weights.reshape(
(1, H, W, A * 4)).transpose(0, 3, 1, 2)
blobs_out.append(
dict(
rpn_labels_int32_wide=_labels,
rpn_bbox_targets_wide=_bbox_targets,
rpn_bbox_inside_weights_wide=_bbox_inside_weights,
rpn_bbox_outside_weights_wide=_bbox_outside_weights
)
)
return blobs_out[0] if len(blobs_out) == 1 else blobs_out
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from Cython.Build import cythonize
from setuptools import Extension
from setuptools import setup
import numpy as np
_NP_INCLUDE_DIRS = np.get_include()
# Extension modules
ext_modules = [
Extension(
name='utils.cython_bbox',
sources=[
'utils/cython_bbox.pyx'
],
extra_compile_args=[
'-Wno-cpp'
],
include_dirs=[
_NP_INCLUDE_DIRS
]
),
Extension(
name='utils.cython_nms',
sources=[
'utils/cython_nms.pyx'
],
extra_compile_args=[
'-Wno-cpp'
],
include_dirs=[
_NP_INCLUDE_DIRS
]
)
]
setup(
name='Detectron',
ext_modules=cythonize(ext_modules)
)
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
#
# Based on:
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
"""Caffe2 blob helper functions."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import cPickle as pickle
import cv2
import numpy as np
from caffe2.proto import caffe2_pb2
from core.config import cfg
def im_list_to_blob(ims):
"""Convert a list of images into a network input. Assumes images were
prepared using prep_im_for_blob or equivalent: i.e.
- BGR channel order
- pixel means subtracted
- resized to the desired input size
- float32 numpy ndarray format
Output is a 4D HCHW tensor of the images concatenated along axis 0 with
shape.
"""
max_shape = np.array([im.shape for im in ims]).max(axis=0)
# Pad the image so they can be divisible by a stride
if cfg.FPN.FPN_ON:
stride = float(cfg.FPN.COARSEST_STRIDE)
max_shape[0] = int(np.ceil(max_shape[0] / stride) * stride)
max_shape[1] = int(np.ceil(max_shape[1] / stride) * stride)
num_images = len(ims)
blob = np.zeros((num_images, max_shape[0], max_shape[1], 3),
dtype=np.float32)
for i in range(num_images):
im = ims[i]
blob[i, 0:im.shape[0], 0:im.shape[1], :] = im
# Move channels (axis 3) to axis 1
# Axis order will become: (batch elem, channel, height, width)
channel_swap = (0, 3, 1, 2)
blob = blob.transpose(channel_swap)
return blob
def prep_im_for_blob(im, pixel_means, target_sizes, max_size):
"""Prepare an image for use as a network input blob. Specially:
- Subtract per-channel pixel mean
- Convert to float32
- Rescale to each of the specified target size (capped at max_size)
Returns a list of transformed images, one for each target size. Also returns
the scale factors that were used to compute each returned image.
"""
im = im.astype(np.float32, copy=False)
im -= pixel_means
im_shape = im.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
ims = []
im_scales = []
for target_size in target_sizes:
im_scale = float(target_size) / float(im_size_min)
# Prevent the biggest axis from being more than max_size
if np.round(im_scale * im_size_max) > max_size:
im_scale = float(max_size) / float(im_size_max)
im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale,
interpolation=cv2.INTER_LINEAR)
ims.append(im)
im_scales.append(im_scale)
return ims, im_scales
def zeros(shape, int32=False):
"""Return a blob of all zeros of the given shape with the correct float or
int data type.
"""
return np.zeros(shape, dtype=np.int32 if int32 else np.float32)
def ones(shape, int32=False):
"""Return a blob of all ones of the given shape with the correct float or
int data type.
"""
return np.ones(shape, dtype=np.int32 if int32 else np.float32)
def py_op_copy_blob(blob_in, blob_out):
"""Copy a numpy ndarray given as blob_in into the Caffe2 CPUTensor blob
given as blob_out. Supports float32 and int32 blob data types. This function
is intended for copying numpy data into a Caffe2 blob in PythonOps.
"""
# Some awkward voodoo required by Caffe2 to support int32 blobs
needs_int32_init = False
try:
_ = blob.data.dtype # noqa
except Exception:
needs_int32_init = blob_in.dtype == np.int32
if needs_int32_init:
# init can only take a list (failed on tuple)
blob_out.init(list(blob_in.shape), caffe2_pb2.TensorProto.INT32)
else:
blob_out.reshape(blob_in.shape)
blob_out.data[...] = blob_in
def get_loss_gradients(model, loss_blobs):
"""Generate a gradient of 1 for each loss specified in 'loss_blobs'"""
loss_gradients = {}
for b in loss_blobs:
loss_grad = model.net.ConstantFill(b, [b + '_grad'], value=1.0)
loss_gradients[str(b)] = str(loss_grad)
return loss_gradients
def serialize(obj):
"""Serialize a Python object using pickle and encode it as an array of
float32 values so that it can be feed into the workspace. See deserialize().
"""
return np.fromstring(pickle.dumps(obj), dtype=np.uint8).astype(np.float32)
def deserialize(arr):
"""Unserialize a Python object from an array of float32 values fetched from
a workspace. See serialize().
"""
return pickle.loads(arr.astype(np.uint8).tobytes())
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
#
# Based on:
# --------------------------------------------------------
# Fast/er R-CNN
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
"""Box manipulation functions. The internal Detectron box format is
[x1, y1, x2, y2] where (x1, y1) specify the top-left box corner and (x2, y2)
specify the bottom-right box corner. Boxes from external sources, e.g.,
datasets, may be in other formats (such as [x, y, w, h]) and require conversion.
This module uses a convention that may seem strange at first: the width of a box
is computed as x2 - x1 + 1 (likewise for height). The "+ 1" dates back to old
object detection days when the coordinates were integer pixel indices, rather
than floating point coordinates in a subpixel coordinate frame. A box with x2 =
x1 and y2 = y1 was taken to include a single pixel, having a width of 1, and
hence requiring the "+ 1". Now, most datasets will likely provide boxes with
floating point coordinates and the width should be more reasonably computed as
x2 - x1.
In practice, as long as a model is trained and tested with a consistent
convention either decision seems to be ok (at least in our experience on COCO).
Since we have a long history of training models with the "+ 1" convention, we
are reluctant to change it even if our modern tastes prefer not to use it.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
from core.config import cfg
import utils.cython_bbox as cython_bbox
import utils.cython_nms as cython_nms
bbox_overlaps = cython_bbox.bbox_overlaps
def boxes_area(boxes):
"""Compute the area of an array of boxes."""
w = (boxes[:, 2] - boxes[:, 0] + 1)
h = (boxes[:, 3] - boxes[:, 1] + 1)
areas = w * h
assert np.all(areas >= 0), 'Negative areas founds'
return areas
def unique_boxes(boxes, scale=1.0):
"""Return indices of unique boxes."""
v = np.array([1, 1e3, 1e6, 1e9])
hashes = np.round(boxes * scale).dot(v)
_, index = np.unique(hashes, return_index=True)
return np.sort(index)
def xywh_to_xyxy(xywh):
"""Convert [x1 y1 w h] box format to [x1 y1 x2 y2] format."""
if isinstance(xywh, (list, tuple)):
# Single box given as a list of coordinates
assert len(xywh) == 4
x1, y1 = xywh[0], xywh[1]
x2 = x1 + np.maximum(0., xywh[2] - 1.)
y2 = y1 + np.maximum(0., xywh[3] - 1.)
return (x1, y1, x2, y2)
elif isinstance(xywh, np.ndarray):
# Multiple boxes given as a 2D ndarray
return np.hstack(
(xywh[:, 0:2], xywh[:, 0:2] + np.maximum(0, xywh[:, 2:4] - 1))
)
else:
raise TypeError('Argument xywh must be a list, tuple, or numpy array.')
def xyxy_to_xywh(xyxy):
"""Convert [x1 y1 x2 y2] box format to [x1 y1 w h] format."""
if isinstance(xyxy, (list, tuple)):
# Single box given as a list of coordinates
assert len(xyxy) == 4
x1, y1 = xyxy[0], xyxy[1]
w = xyxy[2] - x1 + 1
h = xyxy[3] - y1 + 1
return (x1, y1, w, h)
elif isinstance(xyxy, np.ndarray):
# Multiple boxes given as a 2D ndarray
return np.hstack((xyxy[:, 0:2], xyxy[:, 2:4] - xyxy[:, 0:2] + 1))
else:
raise TypeError('Argument xyxy must be a list, tuple, or numpy array.')
def filter_small_boxes(boxes, min_size):
"""Keep boxes with width and height both greater than min_size."""
w = boxes[:, 2] - boxes[:, 0] + 1
h = boxes[:, 3] - boxes[:, 1] + 1
keep = np.where((w > min_size) & (h > min_size))[0]
return keep
def clip_boxes_to_image(boxes, height, width):
"""Clip an array of boxes to an image with the given height and width."""
boxes[:, [0, 2]] = np.minimum(width - 1., np.maximum(0., boxes[:, [0, 2]]))
boxes[:, [1, 3]] = np.minimum(height - 1., np.maximum(0., boxes[:, [1, 3]]))
return boxes
def clip_xyxy_to_image(x1, y1, x2, y2, height, width):
"""Clip coordinates to an image with the given height and width."""
x1 = np.minimum(width - 1., np.maximum(0., x1))
y1 = np.minimum(height - 1., np.maximum(0., y1))
x2 = np.minimum(width - 1., np.maximum(0., x2))
y2 = np.minimum(height - 1., np.maximum(0., y2))
return x1, y1, x2, y2
def clip_tiled_boxes(boxes, im_shape):
"""Clip boxes to image boundaries. im_shape is [height, width] and boxes
has shape (N, 4 * num_tiled_boxes)."""
assert boxes.shape[1] % 4 == 0, \
'boxes.shape[1] is {:d}, but must be divisible by 4.'.format(
boxes.shape[1]
)
# x1 >= 0
boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
# y1 >= 0
boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
# x2 < im_shape[1]
boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
# y2 < im_shape[0]
boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
return boxes
def bbox_transform(boxes, deltas, weights=(1.0, 1.0, 1.0, 1.0)):
"""Forward transform that maps proposal boxes to predicted ground-truth
boxes using bounding-box regression deltas. See bbox_transform_inv for a
description of the weights argument.
"""
if boxes.shape[0] == 0:
return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype)
boxes = boxes.astype(deltas.dtype, copy=False)
widths = boxes[:, 2] - boxes[:, 0] + 1.0
heights = boxes[:, 3] - boxes[:, 1] + 1.0
ctr_x = boxes[:, 0] + 0.5 * widths
ctr_y = boxes[:, 1] + 0.5 * heights
wx, wy, ww, wh = weights
dx = deltas[:, 0::4] / wx
dy = deltas[:, 1::4] / wy
dw = deltas[:, 2::4] / ww
dh = deltas[:, 3::4] / wh
# Prevent sending too large values into np.exp()
dw = np.minimum(dw, cfg.BBOX_XFORM_CLIP)
dh = np.minimum(dh, cfg.BBOX_XFORM_CLIP)
pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
pred_w = np.exp(dw) * widths[:, np.newaxis]
pred_h = np.exp(dh) * heights[:, np.newaxis]
pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
# x1
pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
# y1
pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
# x2 (note: "- 1" is correct; don't be fooled by the asymmetry)
pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w - 1
# y2 (note: "- 1" is correct; don't be fooled by the asymmetry)
pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h - 1
return pred_boxes
def bbox_transform_inv(boxes, gt_boxes, weights=(1.0, 1.0, 1.0, 1.0)):
"""Inverse transform that computes target bounding-box regression deltas
given proposal boxes and ground-truth boxes. The weights argument should be
a 4-tuple of multiplicative weights that are applied to the regression
target.
In older versions of this code (and in py-faster-rcnn), the weights were set
such that the regression deltas would have unit standard deviation on the
training dataset. Presently, rather than computing these statistics exactly,
we use a fixed set of weights (10., 10., 5., 5.) by default. These are
approximately the weights one would get from COCO using the previous unit
stdev heuristic.
"""
ex_widths = boxes[:, 2] - boxes[:, 0] + 1.0
ex_heights = boxes[:, 3] - boxes[:, 1] + 1.0
ex_ctr_x = boxes[:, 0] + 0.5 * ex_widths
ex_ctr_y = boxes[:, 1] + 0.5 * ex_heights
gt_widths = gt_boxes[:, 2] - gt_boxes[:, 0] + 1.0
gt_heights = gt_boxes[:, 3] - gt_boxes[:, 1] + 1.0
gt_ctr_x = gt_boxes[:, 0] + 0.5 * gt_widths
gt_ctr_y = gt_boxes[:, 1] + 0.5 * gt_heights
wx, wy, ww, wh = weights
targets_dx = wx * (gt_ctr_x - ex_ctr_x) / ex_widths
targets_dy = wy * (gt_ctr_y - ex_ctr_y) / ex_heights
targets_dw = ww * np.log(gt_widths / ex_widths)
targets_dh = wh * np.log(gt_heights / ex_heights)
targets = np.vstack((targets_dx, targets_dy, targets_dw,
targets_dh)).transpose()
return targets
def expand_boxes(boxes, scale):
"""Expand an array of boxes by a given scale."""
w_half = (boxes[:, 2] - boxes[:, 0]) * .5
h_half = (boxes[:, 3] - boxes[:, 1]) * .5
x_c = (boxes[:, 2] + boxes[:, 0]) * .5
y_c = (boxes[:, 3] + boxes[:, 1]) * .5
w_half *= scale
h_half *= scale
boxes_exp = np.zeros(boxes.shape)
boxes_exp[:, 0] = x_c - w_half
boxes_exp[:, 2] = x_c + w_half
boxes_exp[:, 1] = y_c - h_half
boxes_exp[:, 3] = y_c + h_half
return boxes_exp
def flip_boxes(boxes, im_width):
"""Flip boxes horizontally."""
boxes_flipped = boxes.copy()
boxes_flipped[:, 0::4] = im_width - boxes[:, 2::4] - 1
boxes_flipped[:, 2::4] = im_width - boxes[:, 0::4] - 1
return boxes_flipped
def aspect_ratio(boxes, aspect_ratio):
"""Perform width-relative aspect ratio transformation."""
boxes_ar = boxes.copy()
boxes_ar[:, 0::4] = aspect_ratio * boxes[:, 0::4]
boxes_ar[:, 2::4] = aspect_ratio * boxes[:, 2::4]
return boxes_ar
def box_voting(top_dets, all_dets, thresh, scoring_method='ID', beta=1.0):
"""Apply bounding-box voting to refine `top_dets` by voting with `all_dets`.
See: https://arxiv.org/abs/1505.01749. Optional score averaging (not in the
referenced paper) can be applied by setting `scoring_method` appropriately.
"""
# top_dets is [N, 5] each row is [x1 y1 x2 y2, sore]
# all_dets is [N, 5] each row is [x1 y1 x2 y2, sore]
top_dets_out = top_dets.copy()
top_boxes = top_dets[:, :4]
all_boxes = all_dets[:, :4]
all_scores = all_dets[:, 4]
top_to_all_overlaps = bbox_overlaps(top_boxes, all_boxes)
for k in range(top_dets_out.shape[0]):
inds_to_vote = np.where(top_to_all_overlaps[k] >= thresh)[0]
boxes_to_vote = all_boxes[inds_to_vote, :]
ws = all_scores[inds_to_vote]
top_dets_out[k, :4] = np.average(boxes_to_vote, axis=0, weights=ws)
if scoring_method == 'ID':
# Identity, nothing to do
pass
elif scoring_method == 'TEMP_AVG':
# Average probabilities (considered as P(detected class) vs.
# P(not the detected class)) after smoothing with a temperature
# hyperparameter.
P = np.vstack((ws, 1.0 - ws))
P_max = np.max(P, axis=0)
X = np.log(P / P_max)
X_exp = np.exp(X / beta)
P_temp = X_exp / np.sum(X_exp, axis=0)
P_avg = P_temp[0].mean()
top_dets_out[k, 4] = P_avg
elif scoring_method == 'AVG':
# Combine new probs from overlapping boxes
top_dets_out[k, 4] = ws.mean()
elif scoring_method == 'IOU_AVG':
P = ws
ws = top_to_all_overlaps[k, inds_to_vote]
P_avg = np.average(P, weights=ws)
top_dets_out[k, 4] = P_avg
elif scoring_method == 'GENERALIZED_AVG':
P_avg = np.mean(ws**beta)**(1.0 / beta)
top_dets_out[k, 4] = P_avg
elif scoring_method == 'QUASI_SUM':
top_dets_out[k, 4] = ws.sum() / float(len(ws))**beta
else:
raise NotImplementedError(
'Unknown scoring method {}'.format(scoring_method)
)
return top_dets_out
def nms(dets, thresh):
"""Apply classic DPM-style greedy NMS."""
if dets.shape[0] == 0:
return []
return cython_nms.nms(dets, thresh)
def soft_nms(
dets, sigma=0.5, overlap_thresh=0.3, score_thresh=0.001, method='linear'
):
"""Apply the soft NMS algorithm from https://arxiv.org/abs/1704.04503."""
if dets.shape[0] == 0:
return dets, []
methods = {'hard': 0, 'linear': 1, 'gaussian': 2}
assert method in methods, 'Unknown soft_nms method: {}'.format(method)
dets, keep = cython_nms.soft_nms(
np.ascontiguousarray(dets, dtype=np.float32),
np.float32(sigma),
np.float32(overlap_thresh),
np.float32(score_thresh),
np.uint8(methods[method])
)
return dets, keep
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Helpful utilities for working with Caffe2."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from six import string_types
import contextlib
from caffe2.proto import caffe2_pb2
from caffe2.python import core
from caffe2.python import dyndep
from caffe2.python import scope
import utils.env as envu
def import_contrib_ops():
"""Import contrib ops needed by Detectron."""
envu.import_nccl_ops()
def import_detectron_ops():
"""Import Detectron ops."""
detectron_ops_lib = envu.get_detectron_ops_lib()
dyndep.InitOpsLibrary(detectron_ops_lib)
def import_custom_ops():
"""Import custom ops."""
custom_ops_lib = envu.get_custom_ops_lib()
dyndep.InitOpsLibrary(custom_ops_lib)
def SuffixNet(name, net, prefix_len, outputs):
"""Returns a new Net from the given Net (`net`) that includes only the ops
after removing the first `prefix_len` number of ops. The new Net is thus a
suffix of `net`. Blobs listed in `outputs` are registered as external output
blobs.
"""
outputs = BlobReferenceList(outputs)
for output in outputs:
assert net.BlobIsDefined(output)
new_net = net.Clone(name)
del new_net.Proto().op[:]
del new_net.Proto().external_input[:]
del new_net.Proto().external_output[:]
# Add suffix ops
new_net.Proto().op.extend(net.Proto().op[prefix_len:])
# Add external input blobs
# Treat any undefined blobs as external inputs
input_names = [
i for op in new_net.Proto().op for i in op.input
if not new_net.BlobIsDefined(i)]
new_net.Proto().external_input.extend(input_names)
# Add external output blobs
output_names = [str(o) for o in outputs]
new_net.Proto().external_output.extend(output_names)
return new_net, [new_net.GetBlobRef(o) for o in output_names]
def BlobReferenceList(blob_ref_or_list):
"""Ensure that the argument is returned as a list of BlobReferences."""
if isinstance(blob_ref_or_list, core.BlobReference):
return [blob_ref_or_list]
elif type(blob_ref_or_list) in (list, tuple):
for b in blob_ref_or_list:
assert isinstance(b, core.BlobReference)
return blob_ref_or_list
else:
raise TypeError(
'blob_ref_or_list must be a BlobReference or a list/tuple of '
'BlobReferences'
)
def UnscopeName(possibly_scoped_name):
"""Remove any name scoping from a (possibly) scoped name. For example,
convert the name 'gpu_0/foo' to 'foo'."""
assert isinstance(possibly_scoped_name, string_types)
return possibly_scoped_name[
possibly_scoped_name.rfind(scope._NAMESCOPE_SEPARATOR) + 1:]
@contextlib.contextmanager
def NamedCudaScope(gpu_id):
"""Creates a GPU name scope and CUDA device scope. This function is provided
to reduce `with ...` nesting levels."""
with GpuNameScope(gpu_id):
with CudaScope(gpu_id):
yield
@contextlib.contextmanager
def GpuNameScope(gpu_id):
"""Create a name scope for GPU device `gpu_id`."""
with core.NameScope('gpu_{:d}'.format(gpu_id)):
yield
@contextlib.contextmanager
def CudaScope(gpu_id):
"""Create a CUDA device scope for GPU device `gpu_id`."""
gpu_dev = CudaDevice(gpu_id)
with core.DeviceScope(gpu_dev):
yield
@contextlib.contextmanager
def CpuScope():
"""Create a CPU device scope."""
cpu_dev = core.DeviceOption(caffe2_pb2.CPU)
with core.DeviceScope(cpu_dev):
yield
def CudaDevice(gpu_id):
"""Create a Cuda device."""
return core.DeviceOption(caffe2_pb2.CUDA, gpu_id)
def gauss_fill(std):
"""Gaussian fill helper to reduce verbosity."""
return ('GaussianFill', {'std': std})
def const_fill(value):
"""Constant fill helper to reduce verbosity."""
return ('ConstantFill', {'value': value})
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""A simple attribute dictionary used for representing configuration options."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
class AttrDict(dict):
def __getattr__(self, name):
if name in self.__dict__:
return self.__dict__[name]
elif name in self:
return self[name]
else:
raise AttributeError(name)
def __setattr__(self, name, value):
if name in self.__dict__:
self.__dict__[name] = value
else:
self[name] = value
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""An awesome colormap for really neat visualizations."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
def colormap(rgb=False):
color_list = np.array(
[
0.000, 0.447, 0.741,
0.850, 0.325, 0.098,
0.929, 0.694, 0.125,
0.494, 0.184, 0.556,
0.466, 0.674, 0.188,
0.301, 0.745, 0.933,
0.635, 0.078, 0.184,
0.300, 0.300, 0.300,
0.600, 0.600, 0.600,
1.000, 0.000, 0.000,
1.000, 0.500, 0.000,
0.749, 0.749, 0.000,
0.000, 1.000, 0.000,
0.000, 0.000, 1.000,
0.667, 0.000, 1.000,
0.333, 0.333, 0.000,
0.333, 0.667, 0.000,
0.333, 1.000, 0.000,
0.667, 0.333, 0.000,
0.667, 0.667, 0.000,
0.667, 1.000, 0.000,
1.000, 0.333, 0.000,
1.000, 0.667, 0.000,
1.000, 1.000, 0.000,
0.000, 0.333, 0.500,
0.000, 0.667, 0.500,
0.000, 1.000, 0.500,
0.333, 0.000, 0.500,
0.333, 0.333, 0.500,
0.333, 0.667, 0.500,
0.333, 1.000, 0.500,
0.667, 0.000, 0.500,
0.667, 0.333, 0.500,
0.667, 0.667, 0.500,
0.667, 1.000, 0.500,
1.000, 0.000, 0.500,
1.000, 0.333, 0.500,
1.000, 0.667, 0.500,
1.000, 1.000, 0.500,
0.000, 0.333, 1.000,
0.000, 0.667, 1.000,
0.000, 1.000, 1.000,
0.333, 0.000, 1.000,
0.333, 0.333, 1.000,
0.333, 0.667, 1.000,
0.333, 1.000, 1.000,
0.667, 0.000, 1.000,
0.667, 0.333, 1.000,
0.667, 0.667, 1.000,
0.667, 1.000, 1.000,
1.000, 0.000, 1.000,
1.000, 0.333, 1.000,
1.000, 0.667, 1.000,
0.167, 0.000, 0.000,
0.333, 0.000, 0.000,
0.500, 0.000, 0.000,
0.667, 0.000, 0.000,
0.833, 0.000, 0.000,
1.000, 0.000, 0.000,
0.000, 0.167, 0.000,
0.000, 0.333, 0.000,
0.000, 0.500, 0.000,
0.000, 0.667, 0.000,
0.000, 0.833, 0.000,
0.000, 1.000, 0.000,
0.000, 0.000, 0.167,
0.000, 0.000, 0.333,
0.000, 0.000, 0.500,
0.000, 0.000, 0.667,
0.000, 0.000, 0.833,
0.000, 0.000, 1.000,
0.000, 0.000, 0.000,
0.143, 0.143, 0.143,
0.286, 0.286, 0.286,
0.429, 0.429, 0.429,
0.571, 0.571, 0.571,
0.714, 0.714, 0.714,
0.857, 0.857, 0.857,
1.000, 1.000, 1.000
]
).astype(np.float32)
color_list = color_list.reshape((-1, 3)) * 255
if not rgb:
color_list = color_list[:, ::-1]
return color_list
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Coordinated access to a shared multithreading/processing queue."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import contextlib
import logging
import Queue
import threading
import traceback
log = logging.getLogger(__name__)
class Coordinator(object):
def __init__(self):
self._event = threading.Event()
def request_stop(self):
log.debug('Coordinator stopping')
self._event.set()
def should_stop(self):
return self._event.is_set()
def wait_for_stop(self):
return self._event.wait()
@contextlib.contextmanager
def stop_on_exception(self):
try:
yield
except Exception:
if not self.should_stop():
traceback.print_exc()
self.request_stop()
def coordinated_get(coordinator, queue):
while not coordinator.should_stop():
try:
return queue.get(block=True, timeout=1.0)
except Queue.Empty:
continue
raise Exception('Coordinator stopped during get()')
def coordinated_put(coordinator, queue, element):
while not coordinator.should_stop():
try:
queue.put(element, block=True, timeout=1.0)
return
except Queue.Full:
continue
raise Exception('Coordinator stopped during put()')
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
#
# Based on:
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Sergey Karayev
# --------------------------------------------------------
cimport cython
import numpy as np
cimport numpy as np
DTYPE = np.float32
ctypedef np.float32_t DTYPE_t
@cython.boundscheck(False)
def bbox_overlaps(
np.ndarray[DTYPE_t, ndim=2] boxes,
np.ndarray[DTYPE_t, ndim=2] query_boxes):
"""
Parameters
----------
boxes: (N, 4) ndarray of float
query_boxes: (K, 4) ndarray of float
Returns
-------
overlaps: (N, K) ndarray of overlap between boxes and query_boxes
"""
cdef unsigned int N = boxes.shape[0]
cdef unsigned int K = query_boxes.shape[0]
cdef np.ndarray[DTYPE_t, ndim=2] overlaps = np.zeros((N, K), dtype=DTYPE)
cdef DTYPE_t iw, ih, box_area
cdef DTYPE_t ua
cdef unsigned int k, n
with nogil:
for k in range(K):
box_area = (
(query_boxes[k, 2] - query_boxes[k, 0] + 1) *
(query_boxes[k, 3] - query_boxes[k, 1] + 1)
)
for n in range(N):
iw = (
min(boxes[n, 2], query_boxes[k, 2]) -
max(boxes[n, 0], query_boxes[k, 0]) + 1
)
if iw > 0:
ih = (
min(boxes[n, 3], query_boxes[k, 3]) -
max(boxes[n, 1], query_boxes[k, 1]) + 1
)
if ih > 0:
ua = float(
(boxes[n, 2] - boxes[n, 0] + 1) *
(boxes[n, 3] - boxes[n, 1] + 1) +
box_area - iw * ih
)
overlaps[n, k] = iw * ih / ua
return overlaps
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
#
# Based on:
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
cimport cython
import numpy as np
cimport numpy as np
cdef inline np.float32_t max(np.float32_t a, np.float32_t b) nogil:
return a if a >= b else b
cdef inline np.float32_t min(np.float32_t a, np.float32_t b) nogil:
return a if a <= b else b
@cython.boundscheck(False)
@cython.cdivision(True)
@cython.wraparound(False)
def nms(np.ndarray[np.float32_t, ndim=2] dets, np.float32_t thresh):
cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0]
cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1]
cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2]
cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3]
cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4]
cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1 + 1) * (y2 - y1 + 1)
cdef np.ndarray[np.int_t, ndim=1] order = scores.argsort()[::-1]
cdef int ndets = dets.shape[0]
cdef np.ndarray[np.int_t, ndim=1] suppressed = \
np.zeros((ndets), dtype=np.int)
# nominal indices
cdef int _i, _j
# sorted indices
cdef int i, j
# temp variables for box i's (the box currently under consideration)
cdef np.float32_t ix1, iy1, ix2, iy2, iarea
# variables for computing overlap with box j (lower scoring box)
cdef np.float32_t xx1, yy1, xx2, yy2
cdef np.float32_t w, h
cdef np.float32_t inter, ovr
with nogil:
for _i in range(ndets):
i = order[_i]
if suppressed[i] == 1:
continue
ix1 = x1[i]
iy1 = y1[i]
ix2 = x2[i]
iy2 = y2[i]
iarea = areas[i]
for _j in range(_i + 1, ndets):
j = order[_j]
if suppressed[j] == 1:
continue
xx1 = max(ix1, x1[j])
yy1 = max(iy1, y1[j])
xx2 = min(ix2, x2[j])
yy2 = min(iy2, y2[j])
w = max(0.0, xx2 - xx1 + 1)
h = max(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (iarea + areas[j] - inter)
if ovr >= thresh:
suppressed[j] = 1
return np.where(suppressed == 0)[0]
# ----------------------------------------------------------
# Soft-NMS: Improving Object Detection With One Line of Code
# Copyright (c) University of Maryland, College Park
# Licensed under The MIT License [see LICENSE for details]
# Written by Navaneeth Bodla and Bharat Singh
# ----------------------------------------------------------
@cython.boundscheck(False)
@cython.cdivision(True)
@cython.wraparound(False)
def soft_nms(
np.ndarray[float, ndim=2] boxes_in,
float sigma=0.5,
float Nt=0.3,
float threshold=0.001,
unsigned int method=0
):
boxes = boxes_in.copy()
cdef unsigned int N = boxes.shape[0]
cdef float iw, ih, box_area
cdef float ua
cdef int pos = 0
cdef float maxscore = 0
cdef int maxpos = 0
cdef float x1, x2, y1, y2, tx1, tx2, ty1, ty2, ts, area, weight, ov
inds = np.arange(N)
for i in range(N):
maxscore = boxes[i, 4]
maxpos = i
tx1 = boxes[i,0]
ty1 = boxes[i,1]
tx2 = boxes[i,2]
ty2 = boxes[i,3]
ts = boxes[i,4]
ti = inds[i]
pos = i + 1
# get max box
while pos < N:
if maxscore < boxes[pos, 4]:
maxscore = boxes[pos, 4]
maxpos = pos
pos = pos + 1
# add max box as a detection
boxes[i,0] = boxes[maxpos,0]
boxes[i,1] = boxes[maxpos,1]
boxes[i,2] = boxes[maxpos,2]
boxes[i,3] = boxes[maxpos,3]
boxes[i,4] = boxes[maxpos,4]
inds[i] = inds[maxpos]
# swap ith box with position of max box
boxes[maxpos,0] = tx1
boxes[maxpos,1] = ty1
boxes[maxpos,2] = tx2
boxes[maxpos,3] = ty2
boxes[maxpos,4] = ts
inds[maxpos] = ti
tx1 = boxes[i,0]
ty1 = boxes[i,1]
tx2 = boxes[i,2]
ty2 = boxes[i,3]
ts = boxes[i,4]
pos = i + 1
# NMS iterations, note that N changes if detection boxes fall below
# threshold
while pos < N:
x1 = boxes[pos, 0]
y1 = boxes[pos, 1]
x2 = boxes[pos, 2]
y2 = boxes[pos, 3]
s = boxes[pos, 4]
area = (x2 - x1 + 1) * (y2 - y1 + 1)
iw = (min(tx2, x2) - max(tx1, x1) + 1)
if iw > 0:
ih = (min(ty2, y2) - max(ty1, y1) + 1)
if ih > 0:
ua = float((tx2 - tx1 + 1) * (ty2 - ty1 + 1) + area - iw * ih)
ov = iw * ih / ua #iou between max box and detection box
if method == 1: # linear
if ov > Nt:
weight = 1 - ov
else:
weight = 1
elif method == 2: # gaussian
weight = np.exp(-(ov * ov)/sigma)
else: # original NMS
if ov > Nt:
weight = 0
else:
weight = 1
boxes[pos, 4] = weight*boxes[pos, 4]
# if box score falls below threshold, discard the box by
# swapping with last box update N
if boxes[pos, 4] < threshold:
boxes[pos,0] = boxes[N-1, 0]
boxes[pos,1] = boxes[N-1, 1]
boxes[pos,2] = boxes[N-1, 2]
boxes[pos,3] = boxes[N-1, 3]
boxes[pos,4] = boxes[N-1, 4]
inds[pos] = inds[N-1]
N = N - 1
pos = pos - 1
pos = pos + 1
return boxes[:N], inds[:N]
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Environment helper functions."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import imp
import os
import sys
def get_runtime_dir():
"""Retrieve the path to the runtime directory."""
return sys.path[0]
def get_py_bin_ext():
"""Retrieve python binary extension."""
return '.py'
def set_up_matplotlib():
"""Set matplotlib up."""
import matplotlib
# Use a non-interactive backend
matplotlib.use('Agg')
def exit_on_error():
"""Exit from a detectron tool when there's an error."""
sys.exit(1)
def import_nccl_ops():
"""Import NCCL ops."""
# There is no need to load NCCL ops since the
# NCCL dependency is built into the Caffe2 gpu lib
pass
def get_caffe2_dir():
"""Retrieve Caffe2 dir path."""
_fp, c2_path, _desc = imp.find_module('caffe2')
assert os.path.exists(c2_path), \
'Caffe2 not found at \'{}\''.format(c2_path)
c2_dir = os.path.dirname(os.path.abspath(c2_path))
return c2_dir
def get_detectron_ops_lib():
"""Retrieve Detectron ops library."""
c2_dir = get_caffe2_dir()
detectron_ops_lib = os.path.join(
c2_dir, 'lib/libcaffe2_detectron_ops_gpu.so')
assert os.path.exists(detectron_ops_lib), \
('Detectron ops lib not found at \'{}\'; make sure that your Caffe2 '
'version includes Detectron module').format(detectron_ops_lib)
return detectron_ops_lib
def get_custom_ops_lib():
"""Retrieve custom ops library."""
lib_dir, _utils = os.path.split(os.path.dirname(__file__))
custom_ops_lib = os.path.join(
lib_dir, 'build/libcaffe2_detectron_custom_ops_gpu.so')
assert os.path.exists(custom_ops_lib), \
'Custom ops lib not found at \'{}\''.format(custom_ops_lib)
return custom_ops_lib
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Image helper functions."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import cv2
import numpy as np
def aspect_ratio_rel(im, aspect_ratio):
"""Performs width-relative aspect ratio transformation."""
im_h, im_w = im.shape[:2]
im_ar_w = int(round(aspect_ratio * im_w))
im_ar = cv2.resize(im, dsize=(im_ar_w, im_h))
return im_ar
def aspect_ratio_abs(im, aspect_ratio):
"""Performs absolute aspect ratio transformation."""
im_h, im_w = im.shape[:2]
im_area = im_h * im_w
im_ar_w = np.sqrt(im_area * aspect_ratio)
im_ar_h = np.sqrt(im_area / aspect_ratio)
assert np.isclose(im_ar_w / im_ar_h, aspect_ratio)
im_ar = cv2.resize(im, dsize=(int(im_ar_w), int(im_ar_h)))
return im_ar
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""IO utilities."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import cPickle as pickle
import hashlib
import logging
import os
import re
import sys
import urllib2
logger = logging.getLogger(__name__)
_DETECTRON_S3_BASE_URL = 'https://s3-us-west-2.amazonaws.com/detectron'
def save_object(obj, file_name):
"""Save a Python object by pickling it."""
file_name = os.path.abspath(file_name)
with open(file_name, 'wb') as f:
pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)
def cache_url(url_or_file, cache_dir):
"""Download the file specified by the URL to the cache_dir and return the
path to the cached file. If the argument is not a URL, simply return it as
is.
"""
is_url = re.match(r'^(?:http)s?://', url_or_file, re.IGNORECASE) is not None
if not is_url:
return url_or_file
url = url_or_file
assert url.startswith(_DETECTRON_S3_BASE_URL), \
('Detectron only automatically caches URLs in the Detectron S3 '
'bucket: {}').format(_DETECTRON_S3_BASE_URL)
cache_file_path = url.replace(_DETECTRON_S3_BASE_URL, cache_dir)
if os.path.exists(cache_file_path):
assert_cache_file_is_ok(url, cache_file_path)
return cache_file_path
cache_file_dir = os.path.dirname(cache_file_path)
if not os.path.exists(cache_file_dir):
os.makedirs(cache_file_dir)
logger.info('Downloading remote file {} to {}'.format(url, cache_file_path))
download_url(url, cache_file_path)
assert_cache_file_is_ok(url, cache_file_path)
return cache_file_path
def assert_cache_file_is_ok(url, file_path):
"""Check that cache file has the correct hash."""
# File is already in the cache, verify that the md5sum matches and
# return local path
cache_file_md5sum = _get_file_md5sum(file_path)
ref_md5sum = _get_reference_md5sum(url)
assert cache_file_md5sum == ref_md5sum, \
('Target URL {} appears to be downloaded to the local cache file '
'{}, but the md5 hash of the local file does not match the '
'reference (actual: {} vs. expected: {}). You may wish to delete '
'the cached file and try again to trigger automatic '
'download.').format(url, file_path, cache_file_md5sum, ref_md5sum)
def _progress_bar(count, total):
"""Report download progress.
Credit:
https://stackoverflow.com/questions/3173320/text-progress-bar-in-the-console/27871113
"""
bar_len = 60
filled_len = int(round(bar_len * count / float(total)))
percents = round(100.0 * count / float(total), 1)
bar = '=' * filled_len + '-' * (bar_len - filled_len)
sys.stdout.write(
' [{}] {}% of {:.1f}MB file \r'.
format(bar, percents, total / 1024 / 1024)
)
sys.stdout.flush()
if count >= total:
sys.stdout.write('\n')
def download_url(
url, dst_file_path, chunk_size=8192, progress_hook=_progress_bar
):
"""Download url and write it to dst_file_path.
Credit:
https://stackoverflow.com/questions/2028517/python-urllib2-progress-hook
"""
response = urllib2.urlopen(url)
total_size = response.info().getheader('Content-Length').strip()
total_size = int(total_size)
bytes_so_far = 0
with open(dst_file_path, 'wb') as f:
while 1:
chunk = response.read(chunk_size)
bytes_so_far += len(chunk)
if not chunk:
break
if progress_hook:
progress_hook(bytes_so_far, total_size)
f.write(chunk)
return bytes_so_far
def _get_file_md5sum(file_name):
"""Compute the md5 hash of a file."""
hash_obj = hashlib.md5()
with open(file_name, 'r') as f:
hash_obj.update(f.read())
return hash_obj.hexdigest()
def _get_reference_md5sum(url):
"""By convention the md5 hash for url is stored in url + '.md5sum'."""
url_md5sum = url + '.md5sum'
md5sum = urllib2.urlopen(url_md5sum).read().strip()
return md5sum
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Keypoint utilities (somewhat specific to COCO keypoints)."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import cv2
import numpy as np
from core.config import cfg
import utils.blob as blob_utils
def get_keypoints():
"""Get the COCO keypoints and their left/right flip coorespondence map."""
# Keypoints are not available in the COCO json for the test split, so we
# provide them here.
keypoints = [
'nose',
'left_eye',
'right_eye',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'
]
keypoint_flip_map = {
'left_eye': 'right_eye',
'left_ear': 'right_ear',
'left_shoulder': 'right_shoulder',
'left_elbow': 'right_elbow',
'left_wrist': 'right_wrist',
'left_hip': 'right_hip',
'left_knee': 'right_knee',
'left_ankle': 'right_ankle'
}
return keypoints, keypoint_flip_map
def get_person_class_index():
"""Index of the person class in COCO."""
return 1
def flip_keypoints(keypoints, keypoint_flip_map, keypoint_coords, width):
"""Left/right flip keypoint_coords. keypoints and keypoint_flip_map are
accessible from get_keypoints().
"""
flipped_kps = keypoint_coords.copy()
for lkp, rkp in keypoint_flip_map.items():
lid = keypoints.index(lkp)
rid = keypoints.index(rkp)
flipped_kps[:, :, lid] = keypoint_coords[:, :, rid]
flipped_kps[:, :, rid] = keypoint_coords[:, :, lid]
# Flip x coordinates
flipped_kps[:, 0, :] = width - flipped_kps[:, 0, :] - 1
# Maintain COCO convention that if visibility == 0, then x, y = 0
inds = np.where(flipped_kps[:, 2, :] == 0)
flipped_kps[inds[0], 0, inds[1]] = 0
return flipped_kps
def flip_heatmaps(heatmaps):
"""Flip heatmaps horizontally."""
keypoints, flip_map = get_keypoints()
heatmaps_flipped = heatmaps.copy()
for lkp, rkp in flip_map.items():
lid = keypoints.index(lkp)
rid = keypoints.index(rkp)
heatmaps_flipped[:, rid, :, :] = heatmaps[:, lid, :, :]
heatmaps_flipped[:, lid, :, :] = heatmaps[:, rid, :, :]
heatmaps_flipped = heatmaps_flipped[:, :, :, ::-1]
return heatmaps_flipped
def heatmaps_to_keypoints(maps, rois):
"""Extract predicted keypoint locations from heatmaps. Output has shape
(#rois, 4, #keypoints) with the 4 rows corresponding to (x, y, logit, prob)
for each keypoint.
"""
# This function converts a discrete image coordinate in a HEATMAP_SIZE x
# HEATMAP_SIZE image to a continuous keypoint coordinate. We maintain
# consistency with keypoints_to_heatmap_labels by using the conversion from
# Heckbert 1990: c = d + 0.5, where d is a discrete coordinate and c is a
# continuous coordinate.
offset_x = rois[:, 0]
offset_y = rois[:, 1]
widths = rois[:, 2] - rois[:, 0]
heights = rois[:, 3] - rois[:, 1]
widths = np.maximum(widths, 1)
heights = np.maximum(heights, 1)
widths_ceil = np.ceil(widths)
heights_ceil = np.ceil(heights)
# NCHW to NHWC for use with OpenCV
maps = np.transpose(maps, [0, 2, 3, 1])
min_size = cfg.KRCNN.INFERENCE_MIN_SIZE
xy_preds = np.zeros(
(len(rois), 4, cfg.KRCNN.NUM_KEYPOINTS), dtype=np.float32)
for i in range(len(rois)):
if min_size > 0:
roi_map_width = int(np.maximum(widths_ceil[i], min_size))
roi_map_height = int(np.maximum(heights_ceil[i], min_size))
else:
roi_map_width = widths_ceil[i]
roi_map_height = heights_ceil[i]
width_correction = widths[i] / roi_map_width
height_correction = heights[i] / roi_map_height
roi_map = cv2.resize(
maps[i], (roi_map_width, roi_map_height),
interpolation=cv2.INTER_CUBIC)
# Bring back to CHW
roi_map = np.transpose(roi_map, [2, 0, 1])
roi_map_probs = scores_to_probs(roi_map.copy())
w = roi_map.shape[2]
for k in range(cfg.KRCNN.NUM_KEYPOINTS):
pos = roi_map[k, :, :].argmax()
x_int = pos % w
y_int = (pos - x_int) // w
assert (roi_map_probs[k, y_int, x_int] ==
roi_map_probs[k, :, :].max())
x = (x_int + 0.5) * width_correction
y = (y_int + 0.5) * height_correction
xy_preds[i, 0, k] = x + offset_x[i]
xy_preds[i, 1, k] = y + offset_y[i]
xy_preds[i, 2, k] = roi_map[k, y_int, x_int]
xy_preds[i, 3, k] = roi_map_probs[k, y_int, x_int]
return xy_preds
def keypoints_to_heatmap_labels(keypoints, rois):
"""Encode keypoint location in the target heatmap for use in
SoftmaxWithLoss.
"""
# Maps keypoints from the half-open interval [x1, x2) on continuous image
# coordinates to the closed interval [0, HEATMAP_SIZE - 1] on discrete image
# coordinates. We use the continuous <-> discrete conversion from Heckbert
# 1990 ("What is the coordinate of a pixel?"): d = floor(c) and c = d + 0.5,
# where d is a discrete coordinate and c is a continuous coordinate.
assert keypoints.shape[2] == cfg.KRCNN.NUM_KEYPOINTS
shape = (len(rois), cfg.KRCNN.NUM_KEYPOINTS)
heatmaps = blob_utils.zeros(shape)
weights = blob_utils.zeros(shape)
offset_x = rois[:, 0]
offset_y = rois[:, 1]
scale_x = cfg.KRCNN.HEATMAP_SIZE / (rois[:, 2] - rois[:, 0])
scale_y = cfg.KRCNN.HEATMAP_SIZE / (rois[:, 3] - rois[:, 1])
for kp in range(keypoints.shape[2]):
vis = keypoints[:, 2, kp] > 0
x = keypoints[:, 0, kp].astype(np.float32)
y = keypoints[:, 1, kp].astype(np.float32)
# Since we use floor below, if a keypoint is exactly on the roi's right
# or bottom boundary, we shift it in by eps (conceptually) to keep it in
# the ground truth heatmap.
x_boundary_inds = np.where(x == rois[:, 2])[0]
y_boundary_inds = np.where(y == rois[:, 3])[0]
x = (x - offset_x) * scale_x
x = np.floor(x)
if len(x_boundary_inds) > 0:
x[x_boundary_inds] = cfg.KRCNN.HEATMAP_SIZE - 1
y = (y - offset_y) * scale_y
y = np.floor(y)
if len(y_boundary_inds) > 0:
y[y_boundary_inds] = cfg.KRCNN.HEATMAP_SIZE - 1
valid_loc = np.logical_and(
np.logical_and(x >= 0, y >= 0),
np.logical_and(
x < cfg.KRCNN.HEATMAP_SIZE, y < cfg.KRCNN.HEATMAP_SIZE))
valid = np.logical_and(valid_loc, vis)
valid = valid.astype(np.int32)
lin_ind = y * cfg.KRCNN.HEATMAP_SIZE + x
heatmaps[:, kp] = lin_ind * valid
weights[:, kp] = valid
return heatmaps, weights
def scores_to_probs(scores):
"""Transforms CxHxW of scores to probabilities spatially."""
channels = scores.shape[0]
for c in range(channels):
temp = scores[c, :, :]
max_score = temp.max()
temp = np.exp(temp - max_score) / np.sum(np.exp(temp - max_score))
scores[c, :, :] = temp
return scores
def nms_oks(kp_predictions, rois, thresh):
"""Nms based on kp predictions."""
scores = np.mean(kp_predictions[:, 2, :], axis=1)
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
ovr = compute_oks(
kp_predictions[i], rois[i], kp_predictions[order[1:]],
rois[order[1:]])
inds = np.where(ovr <= thresh)[0]
order = order[inds + 1]
return keep
def compute_oks(src_keypoints, src_roi, dst_keypoints, dst_roi):
"""Compute OKS for predicted keypoints wrt gt_keypoints.
src_keypoints: 4xK
src_roi: 4x1
dst_keypoints: Nx4xK
dst_roi: Nx4
"""
sigmas = np.array([
.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07, .87,
.87, .89, .89]) / 10.0
vars = (sigmas * 2)**2
# area
src_area = (src_roi[2] - src_roi[0] + 1) * (src_roi[3] - src_roi[1] + 1)
# measure the per-keypoint distance if keypoints visible
dx = dst_keypoints[:, 0, :] - src_keypoints[0, :]
dy = dst_keypoints[:, 1, :] - src_keypoints[1, :]
e = (dx**2 + dy**2) / vars / (src_area + np.spacing(1)) / 2
e = np.sum(np.exp(-e), axis=1) / e.shape[1]
return e
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Utilities for logging."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from collections import deque
from email.mime.text import MIMEText
import json
import logging
import numpy as np
import smtplib
import sys
# Print lower precision floating point values than default FLOAT_REPR
json.encoder.FLOAT_REPR = lambda o: format(o, '.6f')
def log_json_stats(stats, sort_keys=True):
print('json_stats: {:s}'.format(json.dumps(stats, sort_keys=sort_keys)))
class SmoothedValue(object):
"""Track a series of values and provide access to smoothed values over a
window or the global series average.
"""
def __init__(self, window_size):
self.deque = deque(maxlen=window_size)
self.series = []
self.total = 0.0
self.count = 0
def AddValue(self, value):
self.deque.append(value)
self.series.append(value)
self.count += 1
self.total += value
def GetMedianValue(self):
return np.median(self.deque)
def GetAverageValue(self):
return np.mean(self.deque)
def GetGlobalAverageValue(self):
return self.total / self.count
def send_email(subject, body, to):
s = smtplib.SMTP('localhost')
mime = MIMEText(body)
mime['Subject'] = subject
mime['To'] = to
s.sendmail('detectron', to, mime.as_string())
def setup_logging(name):
FORMAT = '%(levelname)s %(filename)s:%(lineno)4d: %(message)s'
# Manually clear root loggers to prevent any module that may have called
# logging.basicConfig() from blocking our logging setup
logging.root.handlers = []
logging.basicConfig(level=logging.INFO, format=FORMAT, stream=sys.stdout)
logger = logging.getLogger(name)
return logger
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Learning rate policies."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
from core.config import cfg
def get_lr_at_iter(it):
"""Get the learning rate at iteration it according to the cfg.SOLVER
settings.
"""
lr = get_lr_func()(it)
if it < cfg.SOLVER.WARM_UP_ITERS:
method = cfg.SOLVER.WARM_UP_METHOD
if method == 'constant':
warmup_factor = cfg.SOLVER.WARM_UP_FACTOR
elif method == 'linear':
alpha = it / cfg.SOLVER.WARM_UP_ITERS
warmup_factor = cfg.SOLVER.WARM_UP_FACTOR * (1 - alpha) + alpha
else:
raise KeyError('Unknown SOLVER.WARM_UP_METHOD: {}'.format(method))
lr *= warmup_factor
return np.float32(lr)
# ---------------------------------------------------------------------------- #
# Learning rate policy functions
# ---------------------------------------------------------------------------- #
def lr_func_steps_with_lrs(cur_iter):
"""For cfg.SOLVER.LR_POLICY = 'steps_with_lrs'
Change the learning rate to specified values at specified iterations.
Example:
cfg.SOLVER.MAX_ITER: 90
cfg.SOLVER.STEPS: [0, 60, 80]
cfg.SOLVER.LRS: [0.02, 0.002, 0.0002]
for cur_iter in [0, 59] use 0.02
in [60, 79] use 0.002
in [80, inf] use 0.0002
"""
ind = get_step_index(cur_iter)
return cfg.SOLVER.LRS[ind]
def lr_func_steps_with_decay(cur_iter):
"""For cfg.SOLVER.LR_POLICY = 'steps_with_decay'
Change the learning rate specified iterations based on the formula
lr = base_lr * gamma ** lr_step_count.
Example:
cfg.SOLVER.MAX_ITER: 90
cfg.SOLVER.STEPS: [0, 60, 80]
cfg.SOLVER.BASE_LR: 0.02
cfg.SOLVER.GAMMA: 0.1
for cur_iter in [0, 59] use 0.02 = 0.02 * 0.1 ** 0
in [60, 79] use 0.002 = 0.02 * 0.1 ** 1
in [80, inf] use 0.0002 = 0.02 * 0.1 ** 2
"""
ind = get_step_index(cur_iter)
return cfg.SOLVER.BASE_LR * cfg.SOLVER.GAMMA ** ind
def lr_func_step(cur_iter):
"""For cfg.SOLVER.LR_POLICY = 'step'
"""
return (
cfg.SOLVER.BASE_LR *
cfg.SOLVER.GAMMA ** (cur_iter // cfg.SOLVER.STEP_SIZE))
# ---------------------------------------------------------------------------- #
# Helpers
# ---------------------------------------------------------------------------- #
def get_step_index(cur_iter):
"""Given an iteration, find which learning rate step we're at."""
assert cfg.SOLVER.STEPS[0] == 0, 'The first step should always start at 0.'
steps = cfg.SOLVER.STEPS + [cfg.SOLVER.MAX_ITER]
for ind, step in enumerate(steps): # NoQA
if cur_iter < step:
break
return ind - 1
def get_lr_func():
policy = 'lr_func_' + cfg.SOLVER.LR_POLICY
if policy not in globals():
raise NotImplementedError(
'Unknown LR policy: {}'.format(cfg.SOLVER.LR_POLICY))
else:
return globals()[policy]
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Helper functions for working with Caffe2 networks (i.e., operator graphs)."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from collections import OrderedDict
import cPickle as pickle
import logging
import numpy as np
import os
import pprint
import yaml
from caffe2.python import core
from caffe2.python import workspace
from core.config import cfg
from utils.io import save_object
import utils.c2 as c2_utils
logger = logging.getLogger(__name__)
def initialize_from_weights_file(model, weights_file, broadcast=True):
"""Initialize a model from weights stored in a pickled dictionary. If
multiple GPUs are used, the loaded weights are synchronized on all GPUs,
unless 'broadcast' is False.
"""
initialize_gpu_0_from_weights_file(model, weights_file)
if broadcast:
broadcast_parameters(model)
def initialize_gpu_0_from_weights_file(model, weights_file):
"""Initialize a network with ops on GPU 0. Note that we always use GPU 0 and
rely on proper usage of CUDA_VISIBLE_DEVICES.
"""
logger.info('Loading from: {}'.format(weights_file))
ws_blobs = workspace.Blobs()
with open(weights_file, 'r') as f:
src_blobs = pickle.load(f)
if 'cfg' in src_blobs:
saved_cfg = yaml.load(src_blobs['cfg'])
configure_bbox_reg_weights(model, saved_cfg)
if 'blobs' in src_blobs:
# Backwards compat--dictionary used to be only blobs, now they are
# stored under the 'blobs' key
src_blobs = src_blobs['blobs']
# Initialize weights on GPU 0 only
unscoped_param_names = OrderedDict() # Print these out in model order
for blob in model.params:
unscoped_param_names[c2_utils.UnscopeName(str(blob))] = True
with c2_utils.NamedCudaScope(0):
for unscoped_param_name in unscoped_param_names.keys():
if (unscoped_param_name.find(']_') >= 0 and
unscoped_param_name not in src_blobs):
# Special case for sharing initialization from a pretrained
# model:
# If a blob named '_[xyz]_foo' is in model.params and not in
# the initialization blob dictionary, then load source blob
# 'foo' into destination blob '_[xyz]_foo'
src_name = unscoped_param_name[
unscoped_param_name.find(']_') + 2:]
else:
src_name = unscoped_param_name
if src_name not in src_blobs:
logger.info('{:s} not found'.format(src_name))
continue
dst_name = core.ScopedName(unscoped_param_name)
has_momentum = src_name + '_momentum' in src_blobs
has_momentum_str = ' [+ momentum]' if has_momentum else ''
logger.info('{:s}{:} loaded from weights file into {:s}: {}'.
format(
src_name, has_momentum_str,
dst_name, src_blobs[src_name].shape))
if dst_name in ws_blobs:
# If the blob is already in the workspace, make sure that it
# matches the shape of the loaded blob
ws_blob = workspace.FetchBlob(dst_name)
assert ws_blob.shape == src_blobs[src_name].shape, \
('Workspace blob {} with shape {} does not match '
'weights file shape {}').format(
src_name,
ws_blob.shape,
src_blobs[src_name].shape)
workspace.FeedBlob(
dst_name,
src_blobs[src_name].astype(np.float32, copy=False))
if has_momentum:
workspace.FeedBlob(
dst_name + '_momentum',
src_blobs[src_name + '_momentum'].astype(
np.float32, copy=False))
# We preserve blobs that are in the weights file but not used by the current
# model. We load these into CPU memory under the '__preserve__/' namescope.
# These blobs will be stored when saving a model to a weights file. This
# feature allows for alternating optimization of Faster R-CNN in which blobs
# unused by one step can still be preserved forward and used to initialize
# another step.
for src_name in src_blobs.keys():
if (src_name not in unscoped_param_names and
not src_name.endswith('_momentum') and
src_blobs[src_name] is not None):
with c2_utils.CpuScope():
workspace.FeedBlob(
'__preserve__/{:s}'.format(src_name), src_blobs[src_name])
logger.info(
'{:s} preserved in workspace (unused)'.format(src_name))
def save_model_to_weights_file(weights_file, model):
"""Stash model weights in a dictionary and pickle them to a file. We map
GPU device scoped names to unscoped names (e.g., 'gpu_0/conv1_w' ->
'conv1_w').
"""
logger.info(
'Saving parameters and momentum to {}'.format(
os.path.abspath(weights_file)))
blobs = {}
# Save all parameters
for param in model.params:
scoped_name = str(param)
unscoped_name = c2_utils.UnscopeName(scoped_name)
if unscoped_name not in blobs:
logger.debug(' {:s} -> {:s}'.format(scoped_name, unscoped_name))
blobs[unscoped_name] = workspace.FetchBlob(scoped_name)
# Save momentum
for param in model.TrainableParams():
scoped_name = str(param) + '_momentum'
unscoped_name = c2_utils.UnscopeName(scoped_name)
if unscoped_name not in blobs:
logger.debug(' {:s} -> {:s}'.format(scoped_name, unscoped_name))
blobs[unscoped_name] = workspace.FetchBlob(scoped_name)
# Save preserved blobs
for scoped_name in workspace.Blobs():
if scoped_name.startswith('__preserve__/'):
unscoped_name = c2_utils.UnscopeName(scoped_name)
if unscoped_name not in blobs:
logger.debug(
' {:s} -> {:s} (preserved)'.format(
scoped_name, unscoped_name))
blobs[unscoped_name] = workspace.FetchBlob(scoped_name)
cfg_yaml = yaml.dump(cfg)
save_object(dict(blobs=blobs, cfg=cfg_yaml), weights_file)
def broadcast_parameters(model):
"""Copy parameter blobs from GPU 0 over the corresponding parameter blobs
on GPUs 1 through cfg.NUM_GPUS - 1.
"""
if cfg.NUM_GPUS == 1:
# no-op if only running on a single GPU
return
def _do_broadcast(all_blobs):
assert len(all_blobs) % cfg.NUM_GPUS == 0, \
('Unexpected value for NUM_GPUS. Make sure you are not '
'running single-GPU inference with NUM_GPUS > 1.')
blobs_per_gpu = int(len(all_blobs) / cfg.NUM_GPUS)
for i in range(blobs_per_gpu):
blobs = [p for p in all_blobs[i::blobs_per_gpu]]
data = workspace.FetchBlob(blobs[0])
logger.debug('Broadcasting {} to'.format(str(blobs[0])))
for i, p in enumerate(blobs[1:]):
logger.debug(' |-> {}'.format(str(p)))
with c2_utils.CudaScope(i + 1):
workspace.FeedBlob(p, data)
_do_broadcast(model.params)
_do_broadcast([b + '_momentum' for b in model.TrainableParams()])
def sum_multi_gpu_blob(blob_name):
"""Return the sum of a scalar blob held on multiple GPUs."""
val = 0
for i in range(cfg.NUM_GPUS):
val += float(workspace.FetchBlob('gpu_{}/{}'.format(i, blob_name)))
return val
def average_multi_gpu_blob(blob_name):
"""Return the average of a scalar blob held on multiple GPUs."""
return sum_multi_gpu_blob(blob_name) / cfg.NUM_GPUS
def print_net(model, namescope='gpu_0'):
"""Print the model network."""
logger.info('Printing model: {}'.format(model.net.Name()))
op_list = model.net.Proto().op
for op in op_list:
input_name = op.input
# For simplicity: only print the first output
# Not recommended if there are split layers
output_name = str(op.output[0])
op_type = op.type
op_name = op.name
if namescope is None or output_name.startswith(namescope):
# Only print the forward pass network
if output_name.find('grad') >= 0:
break
output_shape = workspace.FetchBlob(output_name).shape
first_blob = True
op_label = op_type + (op_name if op_name == '' else ':' + op_name)
suffix = ' ------- (op: {})'.format(op_label)
for j in range(len(input_name)):
if input_name[j] in model.params:
continue
input_blob = workspace.FetchBlob(input_name[j])
if isinstance(input_blob, np.ndarray):
input_shape = input_blob.shape
logger.info('{:28s}: {:20s} => {:28s}: {:20s}{}'.format(
c2_utils.UnscopeName(str(input_name[j])),
'{}'.format(input_shape),
c2_utils.UnscopeName(str(output_name)),
'{}'.format(output_shape),
suffix))
if first_blob:
first_blob = False
suffix = ' ------|'
logger.info('End of model: {}'.format(model.net.Name()))
def configure_bbox_reg_weights(model, saved_cfg):
"""Compatibility for old models trained with bounding box regression
mean/std normalization (instead of fixed weights).
"""
if 'MODEL' not in saved_cfg or 'BBOX_REG_WEIGHTS' not in saved_cfg.MODEL:
logger.warning('Model from weights file was trained before config key '
'MODEL.BBOX_REG_WEIGHTS was added. Forcing '
'MODEL.BBOX_REG_WEIGHTS = (1., 1., 1., 1.) to ensure '
'correct **inference** behavior.')
cfg.MODEL.BBOX_REG_WEIGHTS = (1., 1., 1., 1.)
logger.info('New config:')
logger.info(pprint.pformat(cfg))
assert not model.train, (
'This model was trained with an older version of the code that '
'used bounding box regression mean/std normalization. It can no '
'longer be used for training. To upgrade it to a trainable model '
'please use fb/compat/convert_bbox_reg_normalized_model.py.'
)
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Functions for interacting with segmentation masks in the COCO format.
The following terms are used in this module
mask: a binary mask encoded as a 2D numpy array
segm: a segmentation mask in one of the two COCO formats (polygon or RLE)
polygon: COCO's polygon format
RLE: COCO's run length encoding format
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
import pycocotools.mask as mask_util
def flip_segms(segms, height, width):
"""Left/right flip each mask in a list of masks."""
def _flip_poly(poly, width):
flipped_poly = np.array(poly)
flipped_poly[0::2] = width - np.array(poly[0::2]) - 1
return flipped_poly.tolist()
def _flip_rle(rle, height, width):
if 'counts' in rle and type(rle['counts']) == list:
# Magic RLE format handling painfully discovered by looking at the
# COCO API showAnns function.
rle = mask_util.frPyObjects([rle], height, width)
mask = mask_util.decode(rle)
mask = mask[:, ::-1, :]
rle = mask_util.encode(np.array(mask, order='F', dtype=np.uint8))
return rle
flipped_segms = []
for segm in segms:
if type(segm) == list:
# Polygon format
flipped_segms.append([_flip_poly(poly, width) for poly in segm])
else:
# RLE format
assert type(segm) == dict
flipped_segms.append(_flip_rle(segm, height, width))
return flipped_segms
def polys_to_mask(polygons, height, width):
"""Convert from the COCO polygon segmentation format to a binary mask
encoded as a 2D array of data type numpy.float32. The polygon segmentation
is understood to be enclosed inside a height x width image. The resulting
mask is therefore of shape (height, width).
"""
rle = mask_util.frPyObjects(polygons, height, width)
mask = np.array(mask_util.decode(rle), dtype=np.float32)
# Flatten in case polygons was a list
mask = np.sum(mask, axis=2)
mask = np.array(mask > 0, dtype=np.float32)
return mask
def mask_to_bbox(mask):
"""Compute the tight bounding box of a binary mask."""
xs = np.where(np.sum(mask, axis=0) > 0)[0]
ys = np.where(np.sum(mask, axis=1) > 0)[0]
if len(xs) == 0 or len(ys) == 0:
return None
x0 = xs[0]
x1 = xs[-1]
y0 = ys[0]
y1 = ys[-1]
return np.array((x0, y0, x1, y1), dtype=np.float32)
def polys_to_mask_wrt_box(polygons, box, M):
"""Convert from the COCO polygon segmentation format to a binary mask
encoded as a 2D array of data type numpy.float32. The polygon segmentation
is understood to be enclosed in the given box and rasterized to an M x M
mask. The resulting mask is therefore of shape (M, M).
"""
w = box[2] - box[0]
h = box[3] - box[1]
w = np.maximum(w, 1)
h = np.maximum(h, 1)
polygons_norm = []
for poly in polygons:
p = np.array(poly, dtype=np.float32)
p[0::2] = (p[0::2] - box[0]) * M / w
p[1::2] = (p[1::2] - box[1]) * M / h
polygons_norm.append(p)
rle = mask_util.frPyObjects(polygons_norm, M, M)
mask = np.array(mask_util.decode(rle), dtype=np.float32)
# Flatten in case polygons was a list
mask = np.sum(mask, axis=2)
mask = np.array(mask > 0, dtype=np.float32)
return mask
def polys_to_boxes(polys):
"""Convert a list of polygons into an array of tight bounding boxes."""
boxes_from_polys = np.zeros((len(polys), 4), dtype=np.float32)
for i in range(len(polys)):
poly = polys[i]
x0 = min(min(p[::2]) for p in poly)
x1 = max(max(p[::2]) for p in poly)
y0 = min(min(p[1::2]) for p in poly)
y1 = max(max(p[1::2]) for p in poly)
boxes_from_polys[i, :] = [x0, y0, x1, y1]
return boxes_from_polys
def rle_mask_voting(
top_masks, all_masks, all_dets, iou_thresh, binarize_thresh, method='AVG'
):
"""Returns new masks (in correspondence with `top_masks`) by combining
multiple overlapping masks coming from the pool of `all_masks`. Two methods
for combining masks are supported: 'AVG' uses a weighted average of
overlapping mask pixels; 'UNION' takes the union of all mask pixels.
"""
if len(top_masks) == 0:
return
all_not_crowd = [False] * len(all_masks)
top_to_all_overlaps = mask_util.iou(top_masks, all_masks, all_not_crowd)
decoded_all_masks = [
np.array(mask_util.decode(rle), dtype=np.float32) for rle in all_masks
]
decoded_top_masks = [
np.array(mask_util.decode(rle), dtype=np.float32) for rle in top_masks
]
all_boxes = all_dets[:, :4].astype(np.int32)
all_scores = all_dets[:, 4]
# Fill box support with weights
mask_shape = decoded_all_masks[0].shape
mask_weights = np.zeros((len(all_masks), mask_shape[0], mask_shape[1]))
for k in range(len(all_masks)):
ref_box = all_boxes[k]
x_0 = max(ref_box[0], 0)
x_1 = min(ref_box[2] + 1, mask_shape[1])
y_0 = max(ref_box[1], 0)
y_1 = min(ref_box[3] + 1, mask_shape[0])
mask_weights[k, y_0:y_1, x_0:x_1] = all_scores[k]
mask_weights = np.maximum(mask_weights, 1e-5)
top_segms_out = []
for k in range(len(top_masks)):
# Corner case of empty mask
if decoded_top_masks[k].sum() == 0:
top_segms_out.append(top_masks[k])
continue
inds_to_vote = np.where(top_to_all_overlaps[k] >= iou_thresh)[0]
# Only matches itself
if len(inds_to_vote) == 1:
top_segms_out.append(top_masks[k])
continue
masks_to_vote = [decoded_all_masks[i] for i in inds_to_vote]
if method == 'AVG':
ws = mask_weights[inds_to_vote]
soft_mask = np.average(masks_to_vote, axis=0, weights=ws)
mask = np.array(soft_mask > binarize_thresh, dtype=np.uint8)
elif method == 'UNION':
# Any pixel that's on joins the mask
soft_mask = np.sum(masks_to_vote, axis=0)
mask = np.array(soft_mask > 1e-5, dtype=np.uint8)
else:
raise NotImplementedError('Method {} is unknown'.format(method))
rle = mask_util.encode(np.array(mask[:, :, np.newaxis], order='F'))[0]
top_segms_out.append(rle)
return top_segms_out
def rle_mask_nms(masks, dets, thresh, mode='IOU'):
"""Performs greedy non-maximum suppression based on an overlap measurement
between masks. The type of measurement is determined by `mode` and can be
either 'IOU' (standard intersection over union) or 'IOMA' (intersection over
mininum area).
"""
if len(masks) == 0:
return []
if len(masks) == 1:
return [0]
if mode == 'IOU':
# Computes ious[m1, m2] = area(intersect(m1, m2)) / area(union(m1, m2))
all_not_crowds = [False] * len(masks)
ious = mask_util.iou(masks, masks, all_not_crowds)
elif mode == 'IOMA':
# Computes ious[m1, m2] = area(intersect(m1, m2)) / min(area(m1), area(m2))
all_crowds = [True] * len(masks)
# ious[m1, m2] = area(intersect(m1, m2)) / area(m2)
ious = mask_util.iou(masks, masks, all_crowds)
# ... = max(area(intersect(m1, m2)) / area(m2),
# area(intersect(m2, m1)) / area(m1))
ious = np.maximum(ious, ious.transpose())
elif mode == 'CONTAINMENT':
# Computes ious[m1, m2] = area(intersect(m1, m2)) / area(m2)
# Which measures how much m2 is contained inside m1
all_crowds = [True] * len(masks)
ious = mask_util.iou(masks, masks, all_crowds)
else:
raise NotImplementedError('Mode {} is unknown'.format(mode))
scores = dets[:, 4]
order = np.argsort(-scores)
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
ovr = ious[i, order[1:]]
inds_to_keep = np.where(ovr <= thresh)[0]
order = order[inds_to_keep + 1]
return keep
def rle_masks_to_boxes(masks):
"""Computes the bounding box of each mask in a list of RLE encoded masks."""
if len(masks) == 0:
return []
decoded_masks = [
np.array(mask_util.decode(rle), dtype=np.float32) for rle in masks
]
def get_bounds(flat_mask):
inds = np.where(flat_mask > 0)[0]
return inds.min(), inds.max()
boxes = np.zeros((len(decoded_masks), 4))
keep = [True] * len(decoded_masks)
for i, mask in enumerate(decoded_masks):
if mask.sum() == 0:
keep[i] = False
continue
flat_mask = mask.sum(axis=0)
x0, x1 = get_bounds(flat_mask)
flat_mask = mask.sum(axis=1)
y0, y1 = get_bounds(flat_mask)
boxes[i, :] = (x0, y0, x1, y1)
return boxes, np.where(keep)[0]
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Primitives for running multiple single-GPU jobs in parallel over subranges of
data. These are used for running multi-GPU inference. Subprocesses are used to
avoid the GIL since inference may involve non-trivial amounts of Python code.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os
import yaml
import numpy as np
import subprocess
import cPickle as pickle
from six.moves import shlex_quote
from core.config import cfg
import logging
logger = logging.getLogger(__name__)
def process_in_parallel(tag, total_range_size, binary, output_dir):
"""Run the specified binary cfg.NUM_GPUS times in parallel, each time as a
subprocess that uses one GPU. The binary must accept the command line
arguments `--range {start} {end}` that specify a data processing range.
"""
# Snapshot the current cfg state in order to pass to the inference
# subprocesses
cfg_file = os.path.join(output_dir, '{}_range_config.yaml'.format(tag))
with open(cfg_file, 'w') as f:
yaml.dump(cfg, stream=f)
subprocess_env = os.environ.copy()
processes = []
subinds = np.array_split(range(total_range_size), cfg.NUM_GPUS)
for i in range(cfg.NUM_GPUS):
start = subinds[i][0]
end = subinds[i][-1] + 1
subprocess_env['CUDA_VISIBLE_DEVICES'] = str(i)
cmd = '{binary} --range {start} {end} --cfg {cfg_file} NUM_GPUS 1'
cmd = cmd.format(
binary=shlex_quote(binary),
start=int(start),
end=int(end),
cfg_file=shlex_quote(cfg_file)
)
logger.info('{} range command {}: {}'.format(tag, i, cmd))
if i == 0:
subprocess_stdout = subprocess.PIPE
else:
filename = os.path.join(
output_dir, '%s_range_%s_%s.stdout' % (tag, start, end)
)
subprocess_stdout = open(filename, 'w') # NOQA (close below)
p = subprocess.Popen(
cmd,
shell=True,
env=subprocess_env,
stdout=subprocess_stdout,
stderr=subprocess.STDOUT,
bufsize=1
)
processes.append((i, p, start, end, subprocess_stdout))
# Log output from inference processes and collate their results
outputs = []
for i, p, start, end, subprocess_stdout in processes:
log_subprocess_output(i, p, output_dir, tag, start, end)
if isinstance(subprocess_stdout, file): # NOQA (Python 2 for now)
subprocess_stdout.close()
range_file = os.path.join(
output_dir, '%s_range_%s_%s.pkl' % (tag, start, end)
)
range_data = pickle.load(open(range_file))
outputs.append(range_data)
return outputs
def log_subprocess_output(i, p, output_dir, tag, start, end):
"""Capture the output of each subprocess and log it in the parent process.
The first subprocess's output is logged in realtime. The output from the
other subprocesses is buffered and then printed all at once (in order) when
subprocesses finish.
"""
outfile = os.path.join(
output_dir, '%s_range_%s_%s.stdout' % (tag, start, end)
)
logger.info('# ' + '-' * 76 + ' #')
logger.info(
'stdout of subprocess %s with range [%s, %s]' % (i, start + 1, end)
)
logger.info('# ' + '-' * 76 + ' #')
if i == 0:
# Stream the piped stdout from the first subprocess in realtime
with open(outfile, 'w') as f:
for line in iter(p.stdout.readline, b''):
print(line.rstrip())
f.write(str(line))
p.stdout.close()
ret = p.wait()
else:
# For subprocesses >= 1, wait and dump their log file
ret = p.wait()
with open(outfile, 'r') as f:
print(''.join(f.readlines()))
assert ret == 0, 'Range subprocess failed (exit code: {})'.format(ret)
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
#
# Based on:
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
"""Timing related functions."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import time
class Timer(object):
"""A simple timer."""
def __init__(self):
self.reset()
def tic(self):
# using time.time instead of time.clock because time time.clock
# does not normalize for multithreading
self.start_time = time.time()
def toc(self, average=True):
self.diff = time.time() - self.start_time
self.total_time += self.diff
self.calls += 1
self.average_time = self.total_time / self.calls
if average:
return self.average_time
else:
return self.diff
def reset(self):
self.total_time = 0.
self.calls = 0
self.start_time = 0.
self.diff = 0.
self.average_time = 0.
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Detection output visualization module."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import cv2
import numpy as np
import os
import pycocotools.mask as mask_util
from utils.colormap import colormap
import utils.env as envu
import utils.keypoints as keypoint_utils
# Matplotlib requires certain adjustments in some environments
# Must happen before importing matplotlib
envu.set_up_matplotlib()
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
plt.rcParams['pdf.fonttype'] = 42 # For editing in Adobe Illustrator
_GRAY = (218, 227, 218)
_GREEN = (18, 127, 15)
_WHITE = (255, 255, 255)
def kp_connections(keypoints):
kp_lines = [
[keypoints.index('left_eye'), keypoints.index('right_eye')],
[keypoints.index('left_eye'), keypoints.index('nose')],
[keypoints.index('right_eye'), keypoints.index('nose')],
[keypoints.index('right_eye'), keypoints.index('right_ear')],
[keypoints.index('left_eye'), keypoints.index('left_ear')],
[keypoints.index('right_shoulder'), keypoints.index('right_elbow')],
[keypoints.index('right_elbow'), keypoints.index('right_wrist')],
[keypoints.index('left_shoulder'), keypoints.index('left_elbow')],
[keypoints.index('left_elbow'), keypoints.index('left_wrist')],
[keypoints.index('right_hip'), keypoints.index('right_knee')],
[keypoints.index('right_knee'), keypoints.index('right_ankle')],
[keypoints.index('left_hip'), keypoints.index('left_knee')],
[keypoints.index('left_knee'), keypoints.index('left_ankle')],
[keypoints.index('right_shoulder'), keypoints.index('left_shoulder')],
[keypoints.index('right_hip'), keypoints.index('left_hip')],
]
return kp_lines
def convert_from_cls_format(cls_boxes, cls_segms, cls_keyps):
"""Convert from the class boxes/segms/keyps format generated by the testing
code.
"""
box_list = [b for b in cls_boxes if len(b) > 0]
if len(box_list) > 0:
boxes = np.concatenate(box_list)
else:
boxes = None
if cls_segms is not None:
segms = [s for slist in cls_segms for s in slist]
else:
segms = None
if cls_keyps is not None:
keyps = [k for klist in cls_keyps for k in klist]
else:
keyps = None
classes = []
for j in range(len(cls_boxes)):
classes += [j] * len(cls_boxes[j])
return boxes, segms, keyps, classes
def get_class_string(class_index, score, dataset):
class_text = dataset.classes[class_index] if dataset is not None else \
'id{:d}'.format(class_index)
return class_text + ' {:0.2f}'.format(score).lstrip('0')
def vis_mask(img, mask, col, alpha=0.4, show_border=True, border_thick=1):
"""Visualizes a single binary mask."""
img = img.astype(np.float32)
idx = np.nonzero(mask)
img[idx[0], idx[1], :] *= 1.0 - alpha
img[idx[0], idx[1], :] += alpha * col
if show_border:
contours, _ = cv2.findContours(
mask.copy(), cv2.RETR_CCOMP, cv2.CHAIN_APPROX_NONE)
cv2.drawContours(img, contours, -1, _WHITE, border_thick, cv2.LINE_AA)
return img.astype(np.uint8)
def vis_class(img, pos, class_str, font_scale=0.35):
"""Visualizes the class."""
x0, y0 = int(pos[0]), int(pos[1])
# Compute text size.
txt = class_str
font = cv2.FONT_HERSHEY_SIMPLEX
((txt_w, txt_h), _) = cv2.getTextSize(txt, font, font_scale, 1)
# Place text background.
back_tl = x0, y0 - int(1.3 * txt_h)
back_br = x0 + txt_w, y0
cv2.rectangle(img, back_tl, back_br, _GREEN, -1)
# Show text.
txt_tl = x0, y0 - int(0.3 * txt_h)
cv2.putText(img, txt, txt_tl, font, font_scale, _GRAY, lineType=cv2.LINE_AA)
return img
def vis_bbox(img, bbox, thick=1):
"""Visualizes a bounding box."""
(x0, y0, w, h) = bbox
x1, y1 = int(x0 + w), int(y0 + h)
x0, y0 = int(x0), int(y0)
cv2.rectangle(img, (x0, y0), (x1, y1), _GREEN, thickness=thick)
return img
def vis_keypoints(img, kps, kp_thresh=2, alpha=0.7):
"""Visualizes keypoints (adapted from vis_one_image).
kps has shape (4, #keypoints) where 4 rows are (x, y, logit, prob).
"""
dataset_keypoints, _ = keypoint_utils.get_keypoints()
kp_lines = kp_connections(dataset_keypoints)
# Convert from plt 0-1 RGBA colors to 0-255 BGR colors for opencv.
cmap = plt.get_cmap('rainbow')
colors = [cmap(i) for i in np.linspace(0, 1, len(kp_lines) + 2)]
colors = [(c[2] * 255, c[1] * 255, c[0] * 255) for c in colors]
# Perform the drawing on a copy of the image, to allow for blending.
kp_mask = np.copy(img)
# Draw mid shoulder / mid hip first for better visualization.
mid_shoulder = (
kps[:2, dataset_keypoints.index('right_shoulder')] +
kps[:2, dataset_keypoints.index('left_shoulder')]) / 2.0
sc_mid_shoulder = np.minimum(
kps[2, dataset_keypoints.index('right_shoulder')],
kps[2, dataset_keypoints.index('left_shoulder')])
mid_hip = (
kps[:2, dataset_keypoints.index('right_hip')] +
kps[:2, dataset_keypoints.index('left_hip')]) / 2.0
sc_mid_hip = np.minimum(
kps[2, dataset_keypoints.index('right_hip')],
kps[2, dataset_keypoints.index('left_hip')])
nose_idx = dataset_keypoints.index('nose')
if sc_mid_shoulder > kp_thresh and kps[2, nose_idx] > kp_thresh:
cv2.line(
kp_mask, tuple(mid_shoulder), tuple(kps[:2, nose_idx]),
color=colors[len(kp_lines)], thickness=2, lineType=cv2.LINE_AA)
if sc_mid_shoulder > kp_thresh and sc_mid_hip > kp_thresh:
cv2.line(
kp_mask, tuple(mid_shoulder), tuple(mid_hip),
color=colors[len(kp_lines) + 1], thickness=2, lineType=cv2.LINE_AA)
# Draw the keypoints.
for l in range(len(kp_lines)):
i1 = kp_lines[l][0]
i2 = kp_lines[l][1]
p1 = kps[0, i1], kps[1, i1]
p2 = kps[0, i2], kps[1, i2]
if kps[2, i1] > kp_thresh and kps[2, i2] > kp_thresh:
cv2.line(
kp_mask, p1, p2,
color=colors[l], thickness=2, lineType=cv2.LINE_AA)
if kps[2, i1] > kp_thresh:
cv2.circle(
kp_mask, p1,
radius=3, color=colors[l], thickness=-1, lineType=cv2.LINE_AA)
if kps[2, i2] > kp_thresh:
cv2.circle(
kp_mask, p2,
radius=3, color=colors[l], thickness=-1, lineType=cv2.LINE_AA)
# Blend the keypoints.
return cv2.addWeighted(img, 1.0 - alpha, kp_mask, alpha, 0)
def vis_one_image_opencv(
im, boxes, segms=None, keypoints=None, thresh=0.9, kp_thresh=2,
show_box=False, dataset=None, show_class=False):
"""Constructs a numpy array with the detections visualized."""
if isinstance(boxes, list):
boxes, segms, keypoints, classes = convert_from_cls_format(
boxes, segms, keypoints)
if boxes is None or boxes.shape[0] == 0 or max(boxes[:, 4]) < thresh:
return im
if segms is not None:
masks = mask_util.decode(segms)
color_list = colormap()
mask_color_id = 0
# Display in largest to smallest order to reduce occlusion
areas = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
sorted_inds = np.argsort(-areas)
for i in sorted_inds:
bbox = boxes[i, :4]
score = boxes[i, -1]
if score < thresh:
continue
# show box (off by default)
if show_box:
im = vis_bbox(
im, (bbox[0], bbox[1], bbox[2] - bbox[0], bbox[3] - bbox[1]))
# show class (off by default)
if show_class:
class_str = get_class_string(classes[i], score, dataset)
im = vis_class(im, (bbox[0], bbox[1] - 2), class_str)
# show mask
if segms is not None and len(segms) > i:
color_mask = color_list[mask_color_id % len(color_list), 0:3]
mask_color_id += 1
im = vis_mask(im, masks[..., i], color_mask)
# show keypoints
if keypoints is not None and len(keypoints) > i:
im = vis_keypoints(im, keypoints[i], kp_thresh)
return im
def vis_one_image(
im, im_name, output_dir, boxes, segms=None, keypoints=None, thresh=0.9,
kp_thresh=2, dpi=200, box_alpha=0.0, dataset=None, show_class=False,
ext='pdf'):
"""Visual debugging of detections."""
if not os.path.exists(output_dir):
os.makedirs(output_dir)
if isinstance(boxes, list):
boxes, segms, keypoints, classes = convert_from_cls_format(
boxes, segms, keypoints)
if boxes is None or boxes.shape[0] == 0 or max(boxes[:, 4]) < thresh:
return
dataset_keypoints, _ = keypoint_utils.get_keypoints()
if segms is not None:
masks = mask_util.decode(segms)
color_list = colormap(rgb=True) / 255
kp_lines = kp_connections(dataset_keypoints)
cmap = plt.get_cmap('rainbow')
colors = [cmap(i) for i in np.linspace(0, 1, len(kp_lines) + 2)]
fig = plt.figure(frameon=False)
fig.set_size_inches(im.shape[1] / dpi, im.shape[0] / dpi)
ax = plt.Axes(fig, [0., 0., 1., 1.])
ax.axis('off')
fig.add_axes(ax)
ax.imshow(im)
# Display in largest to smallest order to reduce occlusion
areas = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
sorted_inds = np.argsort(-areas)
mask_color_id = 0
for i in sorted_inds:
bbox = boxes[i, :4]
score = boxes[i, -1]
if score < thresh:
continue
# show box (off by default)
ax.add_patch(
plt.Rectangle((bbox[0], bbox[1]),
bbox[2] - bbox[0],
bbox[3] - bbox[1],
fill=False, edgecolor='g',
linewidth=0.5, alpha=box_alpha))
if show_class:
ax.text(
bbox[0], bbox[1] - 2,
get_class_string(classes[i], score, dataset),
fontsize=3,
family='serif',
bbox=dict(
facecolor='g', alpha=0.4, pad=0, edgecolor='none'),
color='white')
# show mask
if segms is not None and len(segms) > i:
img = np.ones(im.shape)
color_mask = color_list[mask_color_id % len(color_list), 0:3]
mask_color_id += 1
w_ratio = .4
for c in range(3):
color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio
for c in range(3):
img[:, :, c] = color_mask[c]
e = masks[:, :, i]
_, contour, hier = cv2.findContours(
e.copy(), cv2.RETR_CCOMP, cv2.CHAIN_APPROX_NONE)
for c in contour:
polygon = Polygon(
c.reshape((-1, 2)),
fill=True, facecolor=color_mask,
edgecolor='w', linewidth=1.2,
alpha=0.5)
ax.add_patch(polygon)
# show keypoints
if keypoints is not None and len(keypoints) > i:
kps = keypoints[i]
plt.autoscale(False)
for l in range(len(kp_lines)):
i1 = kp_lines[l][0]
i2 = kp_lines[l][1]
if kps[2, i1] > kp_thresh and kps[2, i2] > kp_thresh:
x = [kps[0, i1], kps[0, i2]]
y = [kps[1, i1], kps[1, i2]]
line = plt.plot(x, y)
plt.setp(line, color=colors[l], linewidth=1.0, alpha=0.7)
if kps[2, i1] > kp_thresh:
plt.plot(
kps[0, i1], kps[1, i1], '.', color=colors[l],
markersize=3.0, alpha=0.7)
if kps[2, i2] > kp_thresh:
plt.plot(
kps[0, i2], kps[1, i2], '.', color=colors[l],
markersize=3.0, alpha=0.7)
# add mid shoulder / mid hip for better visualization
mid_shoulder = (
kps[:2, dataset_keypoints.index('right_shoulder')] +
kps[:2, dataset_keypoints.index('left_shoulder')]) / 2.0
sc_mid_shoulder = np.minimum(
kps[2, dataset_keypoints.index('right_shoulder')],
kps[2, dataset_keypoints.index('left_shoulder')])
mid_hip = (
kps[:2, dataset_keypoints.index('right_hip')] +
kps[:2, dataset_keypoints.index('left_hip')]) / 2.0
sc_mid_hip = np.minimum(
kps[2, dataset_keypoints.index('right_hip')],
kps[2, dataset_keypoints.index('left_hip')])
if (sc_mid_shoulder > kp_thresh and
kps[2, dataset_keypoints.index('nose')] > kp_thresh):
x = [mid_shoulder[0], kps[0, dataset_keypoints.index('nose')]]
y = [mid_shoulder[1], kps[1, dataset_keypoints.index('nose')]]
line = plt.plot(x, y)
plt.setp(
line, color=colors[len(kp_lines)], linewidth=1.0, alpha=0.7)
if sc_mid_shoulder > kp_thresh and sc_mid_hip > kp_thresh:
x = [mid_shoulder[0], mid_hip[0]]
y = [mid_shoulder[1], mid_hip[1]]
line = plt.plot(x, y)
plt.setp(
line, color=colors[len(kp_lines) + 1], linewidth=1.0,
alpha=0.7)
output_name = os.path.basename(im_name) + '.' + ext
fig.savefig(os.path.join(output_dir, '{}'.format(output_name)), dpi=dpi)
plt.close('all')
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
# Example usage:
# data_loader_benchmark.par \
# TRAIN.DATASETS voc_2007_trainval \
# NUM_GPUS 2 \
# TRAIN.PROPOSAL_FILES /path/to/voc_2007_trainval/proposals.pkl
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import argparse
import logging
import numpy as np
import pprint
import sys
import time
from caffe2.python import core, workspace, muji
from core.config import assert_and_infer_cfg
from core.config import cfg
from core.config import merge_cfg_from_list
from core.config import merge_cfg_from_file
from datasets.roidb import combined_roidb_for_training
from roi_data.loader import RoIDataLoader
from utils.timer import Timer
import utils.logging
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
'--loaders', dest='num_loaders',
help='Number of data loading threads',
default=4, type=int)
parser.add_argument(
'--dequeuers', dest='num_dequeuers',
help='Number of dequeuers',
default=1, type=int)
parser.add_argument(
'--minibatch-queue-size', dest='minibatch_queue_size',
help='Size of minibatch queue',
default=64, type=int)
parser.add_argument(
'--blobs-queue-capacity', dest='blobs_queue_capacity',
default=8, type=int)
parser.add_argument(
'--num-batches', dest='num_batches',
help='Number of minibatches to run',
default=500, type=int)
parser.add_argument(
'--sleep', dest='sleep_time',
help='Seconds sleep to emulate a network running',
default=0.1, type=float)
parser.add_argument(
'--cfg', dest='cfg_file', help='optional config file', default=None,
type=str)
parser.add_argument(
'--x-factor', dest='x_factor', help='simulates x-factor more GPUs',
default=1, type=int)
parser.add_argument(
'--profiler', dest='profiler', help='profile minibatch load time',
action='store_true')
parser.add_argument(
'opts', help='See lib/core/config.py for all options', default=None,
nargs=argparse.REMAINDER)
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
args = parser.parse_args()
return args
def loader_loop(roi_data_loader):
load_timer = Timer()
iters = 100
for i in range(iters):
load_timer.tic()
roi_data_loader.get_next_minibatch()
load_timer.toc()
print('{:d}/{:d}: Average get_next_minibatch time: {:.3f}s'.format(
i + 1, iters, load_timer.average_time))
def main(opts):
logger = logging.getLogger(__name__)
roidb = combined_roidb_for_training(
cfg.TRAIN.DATASETS, cfg.TRAIN.PROPOSAL_FILES)
logger.info('{:d} roidb entries'.format(len(roidb)))
roi_data_loader = RoIDataLoader(
roidb,
num_loaders=opts.num_loaders,
minibatch_queue_size=opts.minibatch_queue_size,
blobs_queue_capacity=opts.blobs_queue_capacity)
blob_names = roi_data_loader.get_output_names()
net = core.Net('dequeue_net')
net.type = 'dag'
all_blobs = []
for gpu_id in range(cfg.NUM_GPUS):
with core.NameScope('gpu_{}'.format(gpu_id)):
with core.DeviceScope(muji.OnGPU(gpu_id)):
for blob_name in blob_names:
blob = core.ScopedName(blob_name)
all_blobs.append(blob)
workspace.CreateBlob(blob)
logger.info('Creating blob: {}'.format(blob))
net.DequeueBlobs(
roi_data_loader._blobs_queue_name, blob_names)
logger.info("Protobuf:\n" + str(net.Proto()))
if opts.profiler:
import cProfile
cProfile.runctx(
'loader_loop(roi_data_loader)', globals(), locals(),
sort='cumulative')
else:
loader_loop(roi_data_loader)
roi_data_loader.register_sigint_handler()
roi_data_loader.start(prefill=True)
total_time = 0
for i in range(opts.num_batches):
start_t = time.time()
for _ in range(opts.x_factor):
workspace.RunNetOnce(net)
total_time += (time.time() - start_t) / opts.x_factor
logger.info('{:d}/{:d}: Averge dequeue time: {:.3f}s [{:d}/{:d}]'.
format(i + 1, opts.num_batches, total_time / (i + 1),
roi_data_loader._minibatch_queue.qsize(),
opts.minibatch_queue_size))
# Sleep to simulate the time taken by running a little network
time.sleep(opts.sleep_time)
# To inspect:
# blobs = workspace.FetchBlobs(all_blobs)
# from IPython import embed; embed()
logger.info('Shutting down data loader (EnqueueBlob errors are ok)...')
roi_data_loader.shutdown()
if __name__ == '__main__':
workspace.GlobalInit(['caffe2', '--caffe2_log_level=0'])
logger = utils.logging.setup_logging(__name__)
logger.setLevel(logging.DEBUG)
logging.getLogger('roi_data.loader').setLevel(logging.INFO)
np.random.seed(cfg.RNG_SEED)
args = parse_args()
logger.info('Called with args:')
logger.info(args)
if args.cfg_file is not None:
merge_cfg_from_file(args.cfg_file)
if args.opts is not None:
merge_cfg_from_list(args.opts)
assert_and_infer_cfg()
logger.info('Running with config:')
logger.info(pprint.pformat(cfg))
main(args)
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
import unittest
from caffe2.proto import caffe2_pb2
from caffe2.python import core
from caffe2.python import gradient_checker
from caffe2.python import workspace
import utils.c2
import utils.logging
class BatchPermutationOpTest(unittest.TestCase):
def _run_op_test(self, X, I, check_grad=False):
with core.DeviceScope(core.DeviceOption(caffe2_pb2.CUDA, 0)):
op = core.CreateOperator('BatchPermutation', ['X', 'I'], ['Y'])
workspace.FeedBlob('X', X)
workspace.FeedBlob('I', I)
workspace.RunOperatorOnce(op)
Y = workspace.FetchBlob('Y')
if check_grad:
gc = gradient_checker.GradientChecker(
stepsize=0.1,
threshold=0.001,
device_option=core.DeviceOption(caffe2_pb2.CUDA, 0)
)
res, grad, grad_estimated = gc.CheckSimple(op, [X, I], 0, [0])
self.assertTrue(res, 'Grad check failed')
Y_ref = X[I]
np.testing.assert_allclose(Y, Y_ref, rtol=1e-5, atol=1e-08)
def _run_speed_test(self, iters=5, N=1024):
"""This function provides an example of how to benchmark custom
operators using the Caffe2 'prof_dag' network execution type. Please
note that for 'prof_dag' to work, Caffe2 must be compiled with profiling
support using the `-DUSE_PROF=ON` option passed to `cmake` when building
Caffe2.
"""
net = core.Net('test')
net.Proto().type = 'prof_dag'
net.Proto().num_workers = 2
Y = net.BatchPermutation(['X', 'I'], 'Y')
Y_flat = net.FlattenToVec([Y], 'Y_flat')
loss = net.AveragedLoss([Y_flat], 'loss')
net.AddGradientOperators([loss])
workspace.CreateNet(net)
X = np.random.randn(N, 256, 14, 14)
for _i in range(iters):
I = np.random.permutation(N)
workspace.FeedBlob('X', X.astype(np.float32))
workspace.FeedBlob('I', I.astype(np.int32))
workspace.RunNet(net.Proto().name)
np.testing.assert_allclose(
workspace.FetchBlob('Y'), X[I], rtol=1e-5, atol=1e-08
)
def test_forward_and_gradient(self):
A = np.random.randn(2, 3, 5, 7).astype(np.float32)
I = np.array([0, 1], dtype=np.int32)
self._run_op_test(A, I, check_grad=True)
A = np.random.randn(2, 3, 5, 7).astype(np.float32)
I = np.array([1, 0], dtype=np.int32)
self._run_op_test(A, I, check_grad=True)
A = np.random.randn(10, 3, 5, 7).astype(np.float32)
I = np.array(np.random.permutation(10), dtype=np.int32)
self._run_op_test(A, I, check_grad=True)
def test_size_exceptions(self):
A = np.random.randn(2, 256, 42, 86).astype(np.float32)
I = np.array(np.random.permutation(10), dtype=np.int32)
with self.assertRaises(RuntimeError):
self._run_op_test(A, I)
# See doc string in _run_speed_test
# def test_perf(self):
# with core.DeviceScope(core.DeviceOption(caffe2_pb2.CUDA, 0)):
# self._run_speed_test()
if __name__ == '__main__':
workspace.GlobalInit(['caffe2', '--caffe2_log_level=0'])
utils.c2.import_detectron_ops()
assert 'BatchPermutation' in workspace.RegisteredOperators()
utils.logging.setup_logging(__name__)
unittest.main()
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
import unittest
from pycocotools import mask as COCOmask
import utils.boxes as box_utils
def random_boxes(mean_box, stdev, N):
boxes = np.random.randn(N, 4) * stdev + mean_box
return boxes.astype(dtype=np.float32)
class TestBboxTransform(unittest.TestCase):
def test_bbox_transform_and_inverse(self):
weights = (5, 5, 10, 10)
src_boxes = random_boxes([10, 10, 20, 20], 1, 10)
dst_boxes = random_boxes([10, 10, 20, 20], 1, 10)
deltas = box_utils.bbox_transform_inv(
src_boxes, dst_boxes, weights=weights
)
dst_boxes_reconstructed = box_utils.bbox_transform(
src_boxes, deltas, weights=weights
)
np.testing.assert_array_almost_equal(
dst_boxes, dst_boxes_reconstructed, decimal=5
)
def test_bbox_dataset_to_prediction_roundtrip(self):
"""Simulate the process of reading a ground-truth box from a dataset,
make predictions from proposals, convert the predictions back to the
dataset format, and then use the COCO API to compute IoU overlap between
the gt box and the predictions. These should have IoU of 1.
"""
weights = (5, 5, 10, 10)
# 1/ "read" a box from a dataset in the default (x1, y1, w, h) format
gt_xywh_box = [10, 20, 100, 150]
# 2/ convert it to our internal (x1, y1, x2, y2) format
gt_xyxy_box = box_utils.xywh_to_xyxy(gt_xywh_box)
# 3/ consider nearby proposal boxes
prop_xyxy_boxes = random_boxes(gt_xyxy_box, 10, 10)
# 4/ compute proposal-to-gt transformation deltas
deltas = box_utils.bbox_transform_inv(
prop_xyxy_boxes, np.array([gt_xyxy_box]), weights=weights
)
# 5/ use deltas to transform proposals to xyxy predicted box
pred_xyxy_boxes = box_utils.bbox_transform(
prop_xyxy_boxes, deltas, weights=weights
)
# 6/ convert xyxy predicted box to xywh predicted box
pred_xywh_boxes = box_utils.xyxy_to_xywh(pred_xyxy_boxes)
# 7/ use COCO API to compute IoU
not_crowd = [int(False)] * pred_xywh_boxes.shape[0]
ious = COCOmask.iou(pred_xywh_boxes, np.array([gt_xywh_box]), not_crowd)
np.testing.assert_array_almost_equal(ious, np.ones(ious.shape))
def test_cython_bbox_iou_against_coco_api_bbox_iou(self):
"""Check that our cython implementation of bounding box IoU overlap
matches the COCO API implementation.
"""
def _do_test(b1, b2):
# Compute IoU overlap with the cython implementation
cython_iou = box_utils.bbox_overlaps(b1, b2)
# Compute IoU overlap with the COCO API implementation
# (requires converting boxes from xyxy to xywh format)
xywh_b1 = box_utils.xyxy_to_xywh(b1)
xywh_b2 = box_utils.xyxy_to_xywh(b2)
not_crowd = [int(False)] * b2.shape[0]
coco_ious = COCOmask.iou(xywh_b1, xywh_b2, not_crowd)
# IoUs should be similar
np.testing.assert_array_almost_equal(
cython_iou, coco_ious, decimal=5
)
# Test small boxes
b1 = random_boxes([10, 10, 20, 20], 5, 10)
b2 = random_boxes([10, 10, 20, 20], 5, 10)
_do_test(b1, b2)
# Test bigger boxes
b1 = random_boxes([10, 10, 110, 20], 20, 10)
b2 = random_boxes([10, 10, 110, 20], 20, 10)
_do_test(b1, b2)
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import copy
import tempfile
import unittest
import yaml
from core.config import cfg
from utils.collections import AttrDict
import core.config
import utils.logging
class TestCfg(unittest.TestCase):
def test_copy_cfg(self):
cfg2 = copy.deepcopy(cfg)
s = cfg.MODEL.TYPE
cfg2.MODEL.TYPE = 'dummy'
assert cfg.MODEL.TYPE == s
def test_merge_cfg_from_cfg(self):
# Test: merge from deepcopy
s = 'dummy0'
cfg2 = copy.deepcopy(cfg)
cfg2.MODEL.TYPE = s
core.config.merge_cfg_from_cfg(cfg2)
assert cfg.MODEL.TYPE == s
# Test: merge from yaml
s = 'dummy1'
cfg2 = yaml.load(yaml.dump(cfg))
cfg2.MODEL.TYPE = s
core.config.merge_cfg_from_cfg(cfg2)
assert cfg.MODEL.TYPE == s
# Test: merge with a valid key
s = 'dummy2'
cfg2 = AttrDict()
cfg2.MODEL = AttrDict()
cfg2.MODEL.TYPE = s
core.config.merge_cfg_from_cfg(cfg2)
assert cfg.MODEL.TYPE == s
# Test: merge with an invalid key
s = 'dummy3'
cfg2 = AttrDict()
cfg2.FOO = AttrDict()
cfg2.FOO.BAR = s
with self.assertRaises(KeyError):
core.config.merge_cfg_from_cfg(cfg2)
# Test: merge with converted type
cfg2 = AttrDict()
cfg2.TRAIN = AttrDict()
cfg2.TRAIN.SCALES = [1]
core.config.merge_cfg_from_cfg(cfg2)
assert type(cfg.TRAIN.SCALES) is tuple
assert cfg.TRAIN.SCALES[0] == 1
# Test: merge with invalid type
cfg2 = AttrDict()
cfg2.TRAIN = AttrDict()
cfg2.TRAIN.SCALES = 1
with self.assertRaises(ValueError):
core.config.merge_cfg_from_cfg(cfg2)
def test_merge_cfg_from_file(self):
with tempfile.NamedTemporaryFile() as f:
yaml.dump(cfg, f)
s = cfg.MODEL.TYPE
cfg.MODEL.TYPE = 'dummy'
assert cfg.MODEL.TYPE != s
core.config.merge_cfg_from_file(f.name)
assert cfg.MODEL.TYPE == s
def test_merge_cfg_from_list(self):
opts = [
'TRAIN.SCALES', '(100, )', 'MODEL.TYPE', u'foobar', 'NUM_GPUS', 2
]
assert len(cfg.TRAIN.SCALES) > 0
assert cfg.TRAIN.SCALES[0] != 100
assert cfg.MODEL.TYPE != 'foobar'
assert cfg.NUM_GPUS != 2
core.config.merge_cfg_from_list(opts)
assert type(cfg.TRAIN.SCALES) is tuple
assert len(cfg.TRAIN.SCALES) == 1
assert cfg.TRAIN.SCALES[0] == 100
assert cfg.MODEL.TYPE == 'foobar'
assert cfg.NUM_GPUS == 2
def test_deprecated_key_from_list(self):
# You should see logger messages like:
# "Deprecated config key (ignoring): MODEL.DILATION"
opts = ['FINAL_MSG', 'foobar', 'MODEL.DILATION', 2]
with self.assertRaises(AttributeError):
_ = cfg.FINAL_MSG # noqa
with self.assertRaises(AttributeError):
_ = cfg.MODEL.DILATION # noqa
core.config.merge_cfg_from_list(opts)
with self.assertRaises(AttributeError):
_ = cfg.FINAL_MSG # noqa
with self.assertRaises(AttributeError):
_ = cfg.MODEL.DILATION # noqa
def test_deprecated_key_from_file(self):
# You should see logger messages like:
# "Deprecated config key (ignoring): MODEL.DILATION"
with tempfile.NamedTemporaryFile() as f:
cfg2 = copy.deepcopy(cfg)
cfg2.MODEL.DILATION = 2
yaml.dump(cfg2, f)
with self.assertRaises(AttributeError):
_ = cfg.MODEL.DILATION # noqa
core.config.merge_cfg_from_file(f.name)
with self.assertRaises(AttributeError):
_ = cfg.MODEL.DILATION # noqa
def test_renamed_key_from_list(self):
# You should see logger messages like:
# "Key EXAMPLE.RENAMED.KEY was renamed to EXAMPLE.KEY;
# please update your config"
opts = ['EXAMPLE.RENAMED.KEY', 'foobar']
with self.assertRaises(AttributeError):
_ = cfg.EXAMPLE.RENAMED.KEY # noqa
with self.assertRaises(KeyError):
core.config.merge_cfg_from_list(opts)
def test_renamed_key_from_file(self):
# You should see logger messages like:
# "Key EXAMPLE.RENAMED.KEY was renamed to EXAMPLE.KEY;
# please update your config"
with tempfile.NamedTemporaryFile() as f:
cfg2 = copy.deepcopy(cfg)
cfg2.EXAMPLE = AttrDict()
cfg2.EXAMPLE.RENAMED = AttrDict()
cfg2.EXAMPLE.RENAMED.KEY = 'foobar'
yaml.dump(cfg2, f)
with self.assertRaises(AttributeError):
_ = cfg.EXAMPLE.RENAMED.KEY # noqa
with self.assertRaises(KeyError):
core.config.merge_cfg_from_file(f.name)
if __name__ == '__main__':
utils.logging.setup_logging(__name__)
unittest.main()
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
import logging
import unittest
import mock
from caffe2.proto import caffe2_pb2
from caffe2.python import core, workspace, muji
from core.config import cfg, assert_and_infer_cfg
from roi_data.loader import RoIDataLoader
import utils.logging
def get_roidb_blobs(roidb):
blobs = {}
blobs['data'] = np.stack([entry['data'] for entry in roidb])
return blobs, True
def get_net(data_loader, name):
logger = logging.getLogger(__name__)
blob_names = data_loader.get_output_names()
net = core.Net(name)
net.type = 'dag'
for gpu_id in range(cfg.NUM_GPUS):
with core.NameScope('gpu_{}'.format(gpu_id)):
with core.DeviceScope(muji.OnGPU(gpu_id)):
for blob_name in blob_names:
blob = core.ScopedName(blob_name)
workspace.CreateBlob(blob)
net.DequeueBlobs(
data_loader._blobs_queue_name, blob_names)
logger.info("Protobuf:\n" + str(net.Proto()))
return net
def get_roidb_sample_data(sample_data):
roidb = []
for _ in range(np.random.randint(4, 10)):
roidb.append({'data': sample_data})
return roidb
def create_loader_and_network(sample_data, name):
roidb = get_roidb_sample_data(sample_data)
loader = RoIDataLoader(roidb)
net = get_net(loader, 'dequeue_net_train')
loader.register_sigint_handler()
loader.start(prefill=False)
return loader, net
def run_net(net):
workspace.RunNetOnce(net)
gpu_dev = core.DeviceOption(caffe2_pb2.CUDA, 0)
name_scope = 'gpu_{}'.format(0)
with core.NameScope(name_scope):
with core.DeviceScope(gpu_dev):
data = workspace.FetchBlob(core.ScopedName('data'))
return data
class TestRoIDataLoader(unittest.TestCase):
@mock.patch('roi_data.loader.get_minibatch_blob_names',
return_value=[u'data'])
@mock.patch('roi_data.loader.get_minibatch', side_effect=get_roidb_blobs)
def test_two_parallel_loaders(self, _1, _2):
train_data = np.random.rand(2, 3, 3).astype(np.float32)
train_loader, train_net = create_loader_and_network(train_data,
'dequeue_net_train')
test_data = np.random.rand(2, 4, 4).astype(np.float32)
test_loader, test_net = create_loader_and_network(test_data,
'dequeue_net_test')
for _ in range(5):
data = run_net(train_net)
self.assertEqual(data[0].tolist(), train_data.tolist())
data = run_net(test_net)
self.assertEqual(data[0].tolist(), test_data.tolist())
test_loader.shutdown()
train_loader.shutdown()
if __name__ == '__main__':
workspace.GlobalInit(['caffe2', '--caffe2_log_level=0'])
logger = utils.logging.setup_logging(__name__)
logger.setLevel(logging.DEBUG)
logging.getLogger('roi_data.loader').setLevel(logging.INFO)
np.random.seed(cfg.RNG_SEED)
assert_and_infer_cfg()
cfg.TRAIN.ASPECT_GROUPING = False
cfg.NUM_GPUS = 2
unittest.main()
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
import unittest
from caffe2.proto import caffe2_pb2
from caffe2.python import core
from caffe2.python import gradient_checker
from caffe2.python import workspace
import utils.c2
import utils.logging
class SmoothL1LossTest(unittest.TestCase):
def test_forward_and_gradient(self):
Y = np.random.randn(128, 4 * 21).astype(np.float32)
Y_hat = np.random.randn(128, 4 * 21).astype(np.float32)
inside_weights = np.random.randn(128, 4 * 21).astype(np.float32)
inside_weights[inside_weights < 0] = 0
outside_weights = np.random.randn(128, 4 * 21).astype(np.float32)
outside_weights[outside_weights < 0] = 0
scale = np.random.random()
beta = np.random.random()
op = core.CreateOperator(
'SmoothL1Loss', ['Y_hat', 'Y', 'inside_weights', 'outside_weights'],
['loss'],
scale=scale,
beta=beta
)
gc = gradient_checker.GradientChecker(
stepsize=0.005,
threshold=0.005,
device_option=core.DeviceOption(caffe2_pb2.CUDA, 0)
)
res, grad, grad_estimated = gc.CheckSimple(
op, [Y_hat, Y, inside_weights, outside_weights], 0, [0]
)
self.assertTrue(
grad.shape == grad_estimated.shape,
'Fail check: grad.shape != grad_estimated.shape'
)
# To inspect the gradient and estimated gradient:
# np.set_printoptions(precision=3, suppress=True)
# print('grad:')
# print(grad)
# print('grad_estimated:')
# print(grad_estimated)
self.assertTrue(res)
if __name__ == '__main__':
utils.c2.import_detectron_ops()
assert 'SmoothL1Loss' in workspace.RegisteredOperators()
utils.logging.setup_logging(__name__)
unittest.main()
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
import unittest
from caffe2.proto import caffe2_pb2
from caffe2.python import core
from caffe2.python import gradient_checker
from caffe2.python import workspace
import utils.c2
import utils.logging
class SpatialNarrowAsOpTest(unittest.TestCase):
def _run_test(self, A, B, check_grad=False):
with core.DeviceScope(core.DeviceOption(caffe2_pb2.CUDA, 0)):
op = core.CreateOperator('SpatialNarrowAs', ['A', 'B'], ['C'])
workspace.FeedBlob('A', A)
workspace.FeedBlob('B', B)
workspace.RunOperatorOnce(op)
C = workspace.FetchBlob('C')
if check_grad:
gc = gradient_checker.GradientChecker(
stepsize=0.005,
threshold=0.005,
device_option=core.DeviceOption(caffe2_pb2.CUDA, 0)
)
res, grad, grad_estimated = gc.CheckSimple(op, [A, B], 0, [0])
self.assertTrue(res, 'Grad check failed')
dims = C.shape
C_ref = A[:dims[0], :dims[1], :dims[2], :dims[3]]
np.testing.assert_allclose(C, C_ref, rtol=1e-5, atol=1e-08)
def test_small_forward_and_gradient(self):
A = np.random.randn(2, 3, 5, 7).astype(np.float32)
B = np.random.randn(2, 3, 2, 2).astype(np.float32)
self._run_test(A, B, check_grad=True)
A = np.random.randn(2, 3, 5, 7).astype(np.float32)
B = np.random.randn(2, 3, 5).astype(np.float32)
self._run_test(A, B, check_grad=True)
def test_large_forward(self):
A = np.random.randn(2, 256, 42, 100).astype(np.float32)
B = np.random.randn(2, 256, 35, 87).astype(np.float32)
self._run_test(A, B)
A = np.random.randn(2, 256, 42, 87).astype(np.float32)
B = np.random.randn(2, 256, 35, 87).astype(np.float32)
self._run_test(A, B)
def test_size_exceptions(self):
A = np.random.randn(2, 256, 42, 86).astype(np.float32)
B = np.random.randn(2, 256, 35, 87).astype(np.float32)
with self.assertRaises(RuntimeError):
self._run_test(A, B)
A = np.random.randn(2, 255, 42, 88).astype(np.float32)
B = np.random.randn(2, 256, 35, 87).astype(np.float32)
with self.assertRaises(RuntimeError):
self._run_test(A, B)
if __name__ == '__main__':
workspace.GlobalInit(['caffe2', '--caffe2_log_level=0'])
utils.c2.import_detectron_ops()
assert 'SpatialNarrowAs' in workspace.RegisteredOperators()
utils.logging.setup_logging(__name__)
unittest.main()
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
import unittest
from caffe2.proto import caffe2_pb2
from caffe2.python import core
from caffe2.python import workspace
import utils.c2
class ZeroEvenOpTest(unittest.TestCase):
def _run_zero_even_op(self, X):
op = core.CreateOperator('ZeroEven', ['X'], ['Y'])
workspace.FeedBlob('X', X)
workspace.RunOperatorOnce(op)
Y = workspace.FetchBlob('Y')
return Y
def _run_zero_even_op_gpu(self, X):
with core.DeviceScope(core.DeviceOption(caffe2_pb2.CUDA, 0)):
op = core.CreateOperator('ZeroEven', ['X'], ['Y'])
workspace.FeedBlob('X', X)
workspace.RunOperatorOnce(op)
Y = workspace.FetchBlob('Y')
return Y
def test_throws_on_non_1D_arrays(self):
X = np.zeros((2, 2), dtype=np.float32)
with self.assertRaisesRegexp(RuntimeError, 'X\.ndim\(\) == 1'):
self._run_zero_even_op(X)
def test_handles_empty_arrays(self):
X = np.array([], dtype=np.float32)
Y_exp = np.copy(X)
Y_act = self._run_zero_even_op(X)
np.testing.assert_allclose(Y_act, Y_exp)
def test_sets_vals_at_even_inds_to_zero(self):
X = np.array([0, 1, 2, 3, 4], dtype=np.float32)
Y_exp = np.array([0, 1, 0, 3, 0], dtype=np.float32)
Y_act = self._run_zero_even_op(X)
np.testing.assert_allclose(Y_act[0::2], Y_exp[0::2])
def test_preserves_vals_at_odd_inds(self):
X = np.array([0, 1, 2, 3, 4], dtype=np.float32)
Y_exp = np.array([0, 1, 0, 3, 0], dtype=np.float32)
Y_act = self._run_zero_even_op(X)
np.testing.assert_allclose(Y_act[1::2], Y_exp[1::2])
def test_handles_even_length_arrays(self):
X = np.random.rand(64).astype(np.float32)
Y_exp = np.copy(X)
Y_exp[0::2] = 0.0
Y_act = self._run_zero_even_op(X)
np.testing.assert_allclose(Y_act, Y_exp)
def test_handles_odd_length_arrays(self):
X = np.random.randn(77).astype(np.float32)
Y_exp = np.copy(X)
Y_exp[0::2] = 0.0
Y_act = self._run_zero_even_op(X)
np.testing.assert_allclose(Y_act, Y_exp)
def test_gpu_throws_on_non_1D_arrays(self):
X = np.zeros((2, 2), dtype=np.float32)
with self.assertRaisesRegexp(RuntimeError, 'X\.ndim\(\) == 1'):
self._run_zero_even_op_gpu(X)
def test_gpu_handles_empty_arrays(self):
X = np.array([], dtype=np.float32)
Y_exp = np.copy(X)
Y_act = self._run_zero_even_op_gpu(X)
np.testing.assert_allclose(Y_act, Y_exp)
def test_gpu_sets_vals_at_even_inds_to_zero(self):
X = np.array([0, 1, 2, 3, 4], dtype=np.float32)
Y_exp = np.array([0, 1, 0, 3, 0], dtype=np.float32)
Y_act = self._run_zero_even_op_gpu(X)
np.testing.assert_allclose(Y_act[0::2], Y_exp[0::2])
def test_gpu_preserves_vals_at_odd_inds(self):
X = np.array([0, 1, 2, 3, 4], dtype=np.float32)
Y_exp = np.array([0, 1, 0, 3, 0], dtype=np.float32)
Y_act = self._run_zero_even_op_gpu(X)
np.testing.assert_allclose(Y_act[1::2], Y_exp[1::2])
def test_gpu_handles_even_length_arrays(self):
X = np.random.rand(64).astype(np.float32)
Y_exp = np.copy(X)
Y_exp[0::2] = 0.0
Y_act = self._run_zero_even_op_gpu(X)
np.testing.assert_allclose(Y_act, Y_exp)
def test_gpu_handles_odd_length_arrays(self):
X = np.random.randn(77).astype(np.float32)
Y_exp = np.copy(X)
Y_exp[0::2] = 0.0
Y_act = self._run_zero_even_op_gpu(X)
np.testing.assert_allclose(Y_act, Y_exp)
if __name__ == '__main__':
workspace.GlobalInit(['caffe2', '--caffe2_log_level=0'])
utils.c2.import_custom_ops()
assert 'ZeroEven' in workspace.RegisteredOperators()
unittest.main()
#!/usr/bin/env python2
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Script to convert Selective Search proposal boxes into the Detectron proposal
file format.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import cPickle as pickle
import numpy as np
import scipy.io as sio
import sys
from datasets.json_dataset import JsonDataset
if __name__ == '__main__':
dataset_name = sys.argv[1]
file_in = sys.argv[2]
file_out = sys.argv[3]
ds = JsonDataset(dataset_name)
roidb = ds.get_roidb()
raw_data = sio.loadmat(file_in)['boxes'].ravel()
assert raw_data.shape[0] == len(roidb)
boxes = []
scores = []
ids = []
for i in range(raw_data.shape[0]):
if i % 1000 == 0:
print('{}/{}'.format(i + 1, len(roidb)))
# selective search boxes are 1-indexed and (y1, x1, y2, x2)
i_boxes = raw_data[i][:, (1, 0, 3, 2)] - 1
boxes.append(i_boxes.astype(np.float32))
scores.append(np.zeros((i_boxes.shape[0]), dtype=np.float32))
ids.append(roidb[i]['id'])
with open(file_out, 'wb') as f:
pickle.dump(
dict(boxes=boxes, scores=scores, indexes=ids), f,
pickle.HIGHEST_PROTOCOL
)
#!/usr/bin/env python2
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Given a full set of results (boxes, masks, or keypoints) on the 2017 COCO
test set, this script extracts the results subset that corresponds to 2017
test-dev. The test-dev subset can then be submitted to the COCO evaluation
server.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import argparse
import json
import os
import sys
from datasets.dataset_catalog import ANN_FN
from datasets.dataset_catalog import DATASETS
from utils.timer import Timer
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
'--json', dest='json_file',
help='detections json file',
default='', type=str)
parser.add_argument(
'--output-dir', dest='output_dir',
help='output directory',
default='/tmp', type=str)
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
args = parser.parse_args()
return args
def convert(json_file, output_dir):
print('Reading: {}'.format(json_file))
with open(json_file, 'r') as fid:
dt = json.load(fid)
print('done!')
test_image_info = DATASETS['coco_2017_test'][ANN_FN]
with open(test_image_info, 'r') as fid:
info_test = json.load(fid)
image_test = info_test['images']
image_test_id = [i['id'] for i in image_test]
print('{} has {} images'.format(test_image_info, len(image_test_id)))
test_dev_image_info = DATASETS['coco_2017_test-dev'][ANN_FN]
with open(test_dev_image_info, 'r') as fid:
info_testdev = json.load(fid)
image_testdev = info_testdev['images']
image_testdev_id = [i['id'] for i in image_testdev]
print('{} has {} images'.format(test_dev_image_info, len(image_testdev_id)))
dt_testdev = []
print('Filtering test-dev from test...')
t = Timer()
t.tic()
for i in range(len(dt)):
if i % 1000 == 0:
print('{}/{}'.format(i, len(dt)))
if dt[i]['image_id'] in image_testdev_id:
dt_testdev.append(dt[i])
print('Done filtering ({:2}s)!'.format(t.toc()))
filename, file_extension = os.path.splitext(os.path.basename(json_file))
filename = filename + '_test-dev'
filename = os.path.join(output_dir, filename + file_extension)
with open(filename, 'w') as fid:
info_test = json.dump(dt_testdev, fid)
print('Done writing: {}!'.format(filename))
if __name__ == '__main__':
opts = parse_args()
convert(opts.json_file, opts.output_dir)
#!/usr/bin/env python2
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Perform inference on a single image or all images with a certain extension
(e.g., .jpg) in a folder. Allows for using a combination of multiple models.
For example, one model may be used for RPN, another model for Fast R-CNN style
box detection, yet another model to predict masks, and yet another model to
predict keypoints.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import argparse
import cv2 # NOQA (Must import before importing caffe2 due to bug in cv2)
import os
import sys
import yaml
from caffe2.python import workspace
from core.config import assert_and_infer_cfg
from core.config import cfg
from core.config import merge_cfg_from_cfg
from core.config import merge_cfg_from_file
import core.rpn_generator as rpn_engine
import core.test_engine as model_engine
import datasets.dummy_datasets as dummy_datasets
import utils.c2 as c2_utils
import utils.logging
import utils.vis as vis_utils
c2_utils.import_detectron_ops()
# OpenCL may be enabled by default in OpenCV3; disable it because it's not
# thread safe and causes unwanted GPU memory allocations.
cv2.ocl.setUseOpenCL(False)
# infer.py
# --im [path/to/image.jpg]
# --rpn-model [path/to/rpn/model.pkl]
# --rpn-config [path/to/rpn/config.yaml]
# [model1] [config1] [model2] [config2] ...
def parse_args():
parser = argparse.ArgumentParser(description='Inference on an image')
parser.add_argument(
'--im', dest='im_file', help='input image', default=None, type=str
)
parser.add_argument(
'--rpn-pkl',
dest='rpn_pkl',
help='rpn model file (pkl)',
default=None,
type=str
)
parser.add_argument(
'--rpn-cfg',
dest='rpn_cfg',
help='cfg model file (yaml)',
default=None,
type=str
)
parser.add_argument(
'--output-dir',
dest='output_dir',
help='directory for visualization pdfs (default: /tmp/infer)',
default='/tmp/infer',
type=str
)
parser.add_argument(
'models_to_run',
help='list of pkl, yaml pairs',
default=None,
nargs=argparse.REMAINDER
)
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def get_rpn_box_proposals(im, args):
merge_cfg_from_file(args.rpn_cfg)
cfg.TEST.WEIGHTS = args.rpn_pkl
cfg.NUM_GPUS = 1
cfg.MODEL.RPN_ONLY = True
cfg.TEST.RPN_PRE_NMS_TOP_N = 10000
cfg.TEST.RPN_POST_NMS_TOP_N = 2000
assert_and_infer_cfg()
model = model_engine.initialize_model_from_cfg()
with c2_utils.NamedCudaScope(0):
boxes, scores = rpn_engine.im_proposals(model, im)
return boxes, scores
def main(args):
dummy_coco_dataset = dummy_datasets.get_coco_dataset()
cfg_orig = yaml.load(yaml.dump(cfg))
im = cv2.imread(args.im_file)
if args.rpn_pkl is not None:
proposal_boxes, _proposal_scores = get_rpn_box_proposals(im, args)
workspace.ResetWorkspace()
else:
proposal_boxes = None
cls_boxes, cls_segms, cls_keyps = None, None, None
for i in range(0, len(args.models_to_run), 2):
pkl = args.models_to_run[i]
yml = args.models_to_run[i + 1]
merge_cfg_from_cfg(cfg_orig)
merge_cfg_from_file(yml)
if len(pkl) > 0:
cfg.TEST.WEIGHTS = pkl
cfg.NUM_GPUS = 1
assert_and_infer_cfg()
model = model_engine.initialize_model_from_cfg()
with c2_utils.NamedCudaScope(0):
cls_boxes_, cls_segms_, cls_keyps_ = \
model_engine.im_detect_all(model, im, proposal_boxes)
cls_boxes = cls_boxes_ if cls_boxes_ is not None else cls_boxes
cls_segms = cls_segms_ if cls_segms_ is not None else cls_segms
cls_keyps = cls_keyps_ if cls_keyps_ is not None else cls_keyps
workspace.ResetWorkspace()
vis_utils.vis_one_image(
im[:, :, ::-1],
args.im_file,
args.output_dir,
cls_boxes,
cls_segms,
cls_keyps,
dataset=dummy_coco_dataset,
box_alpha=0.3,
show_class=True,
thresh=0.7,
kp_thresh=2
)
def check_args(args):
assert (
(args.rpn_pkl is not None and args.rpn_cfg is not None) or
(args.rpn_pkl is None and args.rpn_cfg is None)
)
if args.rpn_pkl is not None:
assert os.path.exists(args.rpn_pkl)
assert os.path.exists(args.rpn_cfg)
if args.models_to_run is not None:
assert len(args.models_to_run) % 2 == 0
for model_file in args.models_to_run:
if len(model_file) > 0:
assert os.path.exists(model_file)
if __name__ == '__main__':
workspace.GlobalInit(['caffe2', '--caffe2_log_level=0'])
utils.logging.setup_logging(__name__)
args = parse_args()
check_args(args)
main(args)
#!/usr/bin/env python2
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Perform inference on a single image or all images with a certain extension
(e.g., .jpg) in a folder.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from collections import defaultdict
import argparse
import cv2 # NOQA (Must import before importing caffe2 due to bug in cv2)
import glob
import logging
import os
import sys
import time
from caffe2.python import workspace
from core.config import assert_and_infer_cfg
from core.config import cfg
from core.config import merge_cfg_from_file
from utils.timer import Timer
import core.test_engine as infer_engine
import datasets.dummy_datasets as dummy_datasets
import utils.c2 as c2_utils
import utils.logging
import utils.vis as vis_utils
c2_utils.import_detectron_ops()
# OpenCL may be enabled by default in OpenCV3; disable it because it's not
# thread safe and causes unwanted GPU memory allocations.
cv2.ocl.setUseOpenCL(False)
def parse_args():
parser = argparse.ArgumentParser(description='End-to-end inference')
parser.add_argument(
'--cfg',
dest='cfg',
help='cfg model file (/path/to/model_config.yaml)',
default=None,
type=str
)
parser.add_argument(
'--wts',
dest='weights',
help='weights model file (/path/to/model_weights.pkl)',
default=None,
type=str
)
parser.add_argument(
'--output-dir',
dest='output_dir',
help='directory for visualization pdfs (default: /tmp/infer_simple)',
default='/tmp/infer_simple',
type=str
)
parser.add_argument(
'--image-ext',
dest='image_ext',
help='image file name extension (default: jpg)',
default='jpg',
type=str
)
parser.add_argument(
'im_or_folder', help='image or folder of images', default=None
)
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def main(args):
logger = logging.getLogger(__name__)
merge_cfg_from_file(args.cfg)
cfg.TEST.WEIGHTS = args.weights
cfg.NUM_GPUS = 1
assert_and_infer_cfg()
model = infer_engine.initialize_model_from_cfg()
dummy_coco_dataset = dummy_datasets.get_coco_dataset()
if os.path.isdir(args.im_or_folder):
im_list = glob.iglob(args.im_or_folder + '/*.' + args.image_ext)
else:
im_list = [args.im_or_folder]
for i, im_name in enumerate(im_list):
out_name = os.path.join(
args.output_dir, '{}'.format(os.path.basename(im_name) + '.pdf')
)
logger.info('Processing {} -> {}'.format(im_name, out_name))
im = cv2.imread(im_name)
timers = defaultdict(Timer)
t = time.time()
with c2_utils.NamedCudaScope(0):
cls_boxes, cls_segms, cls_keyps = infer_engine.im_detect_all(
model, im, None, timers=timers
)
logger.info('Inference time: {:.3f}s'.format(time.time() - t))
for k, v in timers.items():
logger.info(' | {}: {:.3f}s'.format(k, v.average_time))
if i == 0:
logger.info(
' \ Note: inference on the first image will be slower than the '
'rest (caches and auto-tuning need to warm up)'
)
vis_utils.vis_one_image(
im[:, :, ::-1], # BGR -> RGB for visualization
im_name,
args.output_dir,
cls_boxes,
cls_segms,
cls_keyps,
dataset=dummy_coco_dataset,
box_alpha=0.3,
show_class=True,
thresh=0.7,
kp_thresh=2
)
if __name__ == '__main__':
workspace.GlobalInit(['caffe2', '--caffe2_log_level=0'])
utils.logging.setup_logging(__name__)
args = parse_args()
main(args)
#!/usr/bin/env python2
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Script for converting Caffe (<= 1.0) models into the the simple state dict
format used by Detectron. For example, this script can convert the orignal
ResNet models released by MSRA.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import argparse
import cPickle as pickle
import numpy as np
import os
import sys
from caffe.proto import caffe_pb2
from caffe2.proto import caffe2_pb2
from caffe2.python import caffe_translator
from caffe2.python import utils
from google.protobuf import text_format
def parse_args():
parser = argparse.ArgumentParser(
description='Dump weights from a Caffe model'
)
parser.add_argument(
'--prototxt',
dest='prototxt_file_name',
help='Network definition prototxt file path',
default=None,
type=str
)
parser.add_argument(
'--caffemodel',
dest='caffemodel_file_name',
help='Pretrained network weights file path',
default=None,
type=str
)
parser.add_argument(
'--output',
dest='out_file_name',
help='Output file path',
default=None,
type=str
)
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
args = parser.parse_args()
return args
def normalize_resnet_name(name):
if name.find('res') == 0 and name.find('res_') == -1:
# E.g.,
# res4b11_branch2c -> res4_11_branch2c
# res2a_branch1 -> res2_0_branch1
chunk = name[len('res'):name.find('_')]
name = (
'res' + chunk[0] + '_' + str(
int(chunk[2:]) if len(chunk) > 2 # e.g., "b1" -> 1
else ord(chunk[1]) - ord('a')
) + # e.g., "a" -> 0
name[name.find('_'):]
)
return name
def pickle_weights(out_file_name, weights):
blobs = {
normalize_resnet_name(blob.name): utils.Caffe2TensorToNumpyArray(blob)
for blob in weights.protos
}
with open(out_file_name, 'w') as f:
pickle.dump(blobs, f, protocol=pickle.HIGHEST_PROTOCOL)
print('Wrote blobs:')
print(sorted(blobs.keys()))
def add_missing_biases(caffenet_weights):
for layer in caffenet_weights.layer:
if layer.type == 'Convolution' and len(layer.blobs) == 1:
num_filters = layer.blobs[0].shape.dim[0]
bias_blob = caffe_pb2.BlobProto()
bias_blob.data.extend(np.zeros(num_filters))
bias_blob.num, bias_blob.channels, bias_blob.height = 1, 1, 1
bias_blob.width = num_filters
layer.blobs.extend([bias_blob])
def remove_spatial_bn_layers(caffenet, caffenet_weights):
# Layer types associated with spatial batch norm
remove_types = ['BatchNorm', 'Scale']
def _remove_layers(net):
for i in reversed(range(len(net.layer))):
if net.layer[i].type in remove_types:
net.layer.pop(i)
# First remove layers from caffenet proto
_remove_layers(caffenet)
# We'll return these so we can save the batch norm parameters
bn_layers = [
layer for layer in caffenet_weights.layer if layer.type in remove_types
]
_remove_layers(caffenet_weights)
def _create_tensor(arr, shape, name):
t = caffe2_pb2.TensorProto()
t.name = name
t.data_type = caffe2_pb2.TensorProto.FLOAT
t.dims.extend(shape.dim)
t.float_data.extend(arr)
assert len(t.float_data) == np.prod(t.dims), 'Data size, shape mismatch'
return t
bn_tensors = []
for (bn, scl) in zip(bn_layers[0::2], bn_layers[1::2]):
assert bn.name[len('bn'):] == scl.name[len('scale'):], 'Pair mismatch'
blob_out = 'res' + bn.name[len('bn'):] + '_bn'
bn_mean = np.asarray(bn.blobs[0].data)
bn_var = np.asarray(bn.blobs[1].data)
scale = np.asarray(scl.blobs[0].data)
bias = np.asarray(scl.blobs[1].data)
std = np.sqrt(bn_var + 1e-5)
new_scale = scale / std
new_bias = bias - bn_mean * scale / std
new_scale_tensor = _create_tensor(
new_scale, bn.blobs[0].shape, blob_out + '_s'
)
new_bias_tensor = _create_tensor(
new_bias, bn.blobs[0].shape, blob_out + '_b'
)
bn_tensors.extend([new_scale_tensor, new_bias_tensor])
return bn_tensors
def remove_layers_without_parameters(caffenet, caffenet_weights):
for i in reversed(range(len(caffenet_weights.layer))):
if len(caffenet_weights.layer[i].blobs) == 0:
# Search for the corresponding layer in caffenet and remove it
name = caffenet_weights.layer[i].name
found = False
for j in range(len(caffenet.layer)):
if caffenet.layer[j].name == name:
caffenet.layer.pop(j)
found = True
break
if not found and name[-len('_split'):] != '_split':
print('Warning: layer {} not found in caffenet'.format(name))
caffenet_weights.layer.pop(i)
def normalize_shape(caffenet_weights):
for layer in caffenet_weights.layer:
for blob in layer.blobs:
shape = (blob.num, blob.channels, blob.height, blob.width)
if len(blob.data) != np.prod(shape):
shape = tuple(blob.shape.dim)
if len(shape) == 1:
# Handle biases
shape = (1, 1, 1, shape[0])
if len(shape) == 2:
# Handle InnerProduct layers
shape = (1, 1, shape[0], shape[1])
assert len(shape) == 4
blob.num, blob.channels, blob.height, blob.width = shape
def load_and_convert_caffe_model(prototxt_file_name, caffemodel_file_name):
caffenet = caffe_pb2.NetParameter()
caffenet_weights = caffe_pb2.NetParameter()
text_format.Merge(open(prototxt_file_name).read(), caffenet)
caffenet_weights.ParseFromString(open(caffemodel_file_name).read())
# C2 conv layers current require biases, but they are optional in C1
# Add zeros as biases is they are missing
add_missing_biases(caffenet_weights)
# We only care about getting parameters, so remove layers w/o parameters
remove_layers_without_parameters(caffenet, caffenet_weights)
# BatchNorm is not implemented in the translator *and* we need to fold Scale
# layers into the new C2 SpatialBN op, hence we remove the batch norm layers
# and apply custom translations code
bn_weights = remove_spatial_bn_layers(caffenet, caffenet_weights)
# Set num, channel, height and width for blobs that use shape.dim instead
normalize_shape(caffenet_weights)
# Translate the rest of the model
net, pretrained_weights = caffe_translator.TranslateModel(
caffenet, caffenet_weights
)
pretrained_weights.protos.extend(bn_weights)
return net, pretrained_weights
if __name__ == '__main__':
args = parse_args()
assert os.path.exists(args.prototxt_file_name), \
'Prototxt file does not exist'
assert os.path.exists(args.caffemodel_file_name), \
'Weights file does not exist'
net, weights = load_and_convert_caffe_model(
args.prototxt_file_name, args.caffemodel_file_name
)
pickle_weights(args.out_file_name, weights)
#!/usr/bin/env python2
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
#
# Based on:
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
"""Reval = re-eval. Re-evaluate saved detections."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import argparse
import cPickle as pickle
import os
import sys
import yaml
from core.config import cfg
from datasets import task_evaluation
from datasets.json_dataset import JsonDataset
import core.config
import utils.logging
def parse_args():
parser = argparse.ArgumentParser(description='Re-evaluate results')
parser.add_argument(
'output_dir', nargs=1, help='results directory', type=str
)
parser.add_argument(
'--dataset',
dest='dataset_name',
help='dataset to re-evaluate',
default='voc_2007_test',
type=str
)
parser.add_argument(
'--matlab',
dest='matlab_eval',
help='use matlab for evaluation',
action='store_true'
)
parser.add_argument(
'--comp',
dest='comp_mode',
help='competition mode',
action='store_true'
)
parser.add_argument(
'--cfg',
dest='cfg_file',
help='optional config file',
default=None,
type=str
)
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
args = parser.parse_args()
return args
def do_reval(dataset_name, output_dir, args):
dataset = JsonDataset(dataset_name)
with open(os.path.join(output_dir, 'detections.pkl'), 'rb') as f:
dets = pickle.load(f)
# Override config with the one saved in the detections file
if args.cfg_file is not None:
core.config.merge_cfg_from_cfg(yaml.load(dets['cfg']))
else:
core.config._merge_a_into_b(yaml.load(dets['cfg']), cfg)
results = task_evaluation.evaluate_all(
dataset,
dets['all_boxes'],
dets['all_segms'],
dets['all_keyps'],
output_dir,
use_matlab=args.matlab_eval
)
task_evaluation.log_copy_paste_friendly_results(results)
if __name__ == '__main__':
utils.logging.setup_logging(__name__)
args = parse_args()
if args.comp_mode:
cfg.TEST.COMPETITION_MODE = True
output_dir = os.path.abspath(args.output_dir[0])
do_reval(args.dataset_name, output_dir, args)
#!/usr/bin/env python2
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Perform inference on one or more datasets."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import argparse
import cv2 # NOQA (Must import before importing caffe2 due to bug in cv2)
import os
import pprint
import sys
import time
from caffe2.python import workspace
from core.config import assert_and_infer_cfg
from core.config import cfg
from core.config import merge_cfg_from_file
from core.config import merge_cfg_from_list
from core.rpn_generator import generate_rpn_on_dataset
from core.rpn_generator import generate_rpn_on_range
from core.test_engine import test_net, test_net_on_dataset
from core.test_retinanet import test_retinanet
from core.test_retinanet import test_retinanet_on_dataset
from datasets import task_evaluation
import utils.c2
import utils.logging
utils.c2.import_detectron_ops()
# OpenCL may be enabled by default in OpenCV3; disable it because it's not
# thread safe and causes unwanted GPU memory allocations.
cv2.ocl.setUseOpenCL(False)
def parse_args():
parser = argparse.ArgumentParser(description='Test a Fast R-CNN network')
parser.add_argument(
'--cfg',
dest='cfg_file',
help='optional config file',
default=None,
type=str
)
parser.add_argument(
'--wait',
dest='wait',
help='wait until net file exists',
default=True,
type=bool
)
parser.add_argument(
'--vis', dest='vis', help='visualize detections', action='store_true'
)
parser.add_argument(
'--multi-gpu-testing',
dest='multi_gpu_testing',
help='using cfg.NUM_GPUS for inference',
action='store_true'
)
parser.add_argument(
'--range',
dest='range',
help='start (inclusive) and end (exclusive) indices',
default=None,
type=int,
nargs=2
)
parser.add_argument(
'opts',
help='See lib/core/config.py for all options',
default=None,
nargs=argparse.REMAINDER
)
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def main(ind_range=None, multi_gpu_testing=False):
# Determine which parent or child function should handle inference
if cfg.MODEL.RPN_ONLY:
child_func = generate_rpn_on_range
parent_func = generate_rpn_on_dataset
elif cfg.RETINANET.RETINANET_ON:
child_func = test_retinanet
parent_func = test_retinanet_on_dataset
else:
# Generic case that handles all network types other than RPN-only nets
# and RetinaNet
child_func = test_net
parent_func = test_net_on_dataset
is_parent = ind_range is None
if is_parent:
# Parent case:
# In this case we're either running inference on the entire dataset in a
# single process or (if multi_gpu_testing is True) using this process to
# launch subprocesses that each run inference on a range of the dataset
if len(cfg.TEST.DATASETS) == 0:
cfg.TEST.DATASETS = (cfg.TEST.DATASET, )
cfg.TEST.PROPOSAL_FILES = (cfg.TEST.PROPOSAL_FILE, )
all_results = {}
for i in range(len(cfg.TEST.DATASETS)):
cfg.TEST.DATASET = cfg.TEST.DATASETS[i]
if cfg.TEST.PRECOMPUTED_PROPOSALS:
cfg.TEST.PROPOSAL_FILE = cfg.TEST.PROPOSAL_FILES[i]
results = parent_func(multi_gpu=multi_gpu_testing)
all_results.update(results)
task_evaluation.check_expected_results(
all_results,
atol=cfg.EXPECTED_RESULTS_ATOL,
rtol=cfg.EXPECTED_RESULTS_RTOL
)
task_evaluation.log_copy_paste_friendly_results(all_results)
else:
# Subprocess child case:
# In this case test_net was called via subprocess.Popen to execute on a
# range of inputs on a single dataset (i.e., use cfg.TEST.DATASET and
# don't loop over cfg.TEST.DATASETS)
child_func(ind_range=ind_range)
if __name__ == '__main__':
workspace.GlobalInit(['caffe2', '--caffe2_log_level=0'])
logger = utils.logging.setup_logging(__name__)
args = parse_args()
logger.info('Called with args:')
logger.info(args)
if args.cfg_file is not None:
merge_cfg_from_file(args.cfg_file)
if args.opts is not None:
merge_cfg_from_list(args.opts)
assert_and_infer_cfg()
logger.info('Testing with config:')
logger.info(pprint.pformat(cfg))
while not os.path.exists(cfg.TEST.WEIGHTS) and args.wait:
logger.info('Waiting for \'{}\' to exist...'.format(cfg.TEST.WEIGHTS))
time.sleep(10)
main(ind_range=args.range, multi_gpu_testing=args.multi_gpu_testing)
#!/usr/bin/env python2
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Train a network with Detectron."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import argparse
import cv2 # NOQA (Must import before importing caffe2 due to bug in cv2)
import datetime
import logging
import numpy as np
import os
import pprint
import re
import sys
import test_net
from caffe2.python import memonger
from caffe2.python import utils as c2_py_utils
from caffe2.python import workspace
from core.config import assert_and_infer_cfg
from core.config import cfg
from core.config import get_output_dir
from core.config import merge_cfg_from_file
from core.config import merge_cfg_from_list
from datasets.roidb import combined_roidb_for_training
from modeling import model_builder
from utils.logging import log_json_stats
from utils.logging import setup_logging
from utils.logging import SmoothedValue
from utils.timer import Timer
import utils.c2
import utils.env as envu
import utils.net as nu
utils.c2.import_contrib_ops()
utils.c2.import_detectron_ops()
# OpenCL may be enabled by default in OpenCV3; disable it because it's not
# thread safe and causes unwanted GPU memory allocations.
cv2.ocl.setUseOpenCL(False)
def parse_args():
parser = argparse.ArgumentParser(
description='Train a network with Detectron'
)
parser.add_argument(
'--cfg',
dest='cfg_file',
help='Config file for training (and optionally testing)',
default=None,
type=str
)
parser.add_argument(
'--multi-gpu-testing',
dest='multi_gpu_testing',
help='Use cfg.NUM_GPUS GPUs for inference',
action='store_true'
)
parser.add_argument(
'--skip-test',
dest='skip_test',
help='Do not test the final model',
action='store_true'
)
parser.add_argument(
'opts',
help='See lib/core/config.py for all options',
default=None,
nargs=argparse.REMAINDER
)
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
class TrainingStats(object):
"""Track vital training statistics."""
def __init__(self, model):
# Window size for smoothing tracked values (with median filtering)
self.WIN_SZ = 20
# Output logging period in SGD iterations
self.LOG_PERIOD = 20
self.smoothed_losses_and_metrics = {
key: SmoothedValue(self.WIN_SZ)
for key in model.losses + model.metrics
}
self.losses_and_metrics = {
key: 0
for key in model.losses + model.metrics
}
self.smoothed_total_loss = SmoothedValue(self.WIN_SZ)
self.smoothed_mb_qsize = SmoothedValue(self.WIN_SZ)
self.iter_total_loss = np.nan
self.iter_timer = Timer()
self.model = model
def IterTic(self):
self.iter_timer.tic()
def IterToc(self):
return self.iter_timer.toc(average=False)
def ResetIterTimer(self):
self.iter_timer.reset()
def UpdateIterStats(self):
"""Update tracked iteration statistics."""
for k in self.losses_and_metrics.keys():
if k in self.model.losses:
self.losses_and_metrics[k] = nu.sum_multi_gpu_blob(k)
else:
self.losses_and_metrics[k] = nu.average_multi_gpu_blob(k)
for k, v in self.smoothed_losses_and_metrics.items():
v.AddValue(self.losses_and_metrics[k])
self.iter_total_loss = np.sum(
np.array([self.losses_and_metrics[k] for k in self.model.losses])
)
self.smoothed_total_loss.AddValue(self.iter_total_loss)
self.smoothed_mb_qsize.AddValue(
self.model.roi_data_loader._minibatch_queue.qsize()
)
def LogIterStats(self, cur_iter, lr):
"""Log the tracked statistics."""
if (cur_iter % self.LOG_PERIOD == 0 or
cur_iter == cfg.SOLVER.MAX_ITER - 1):
eta_seconds = self.iter_timer.average_time * (
cfg.SOLVER.MAX_ITER - cur_iter
)
eta = str(datetime.timedelta(seconds=int(eta_seconds)))
mem_stats = c2_py_utils.GetGPUMemoryUsageStats()
mem_usage = np.max(mem_stats['max_by_gpu'][:cfg.NUM_GPUS])
stats = dict(
iter=cur_iter,
lr=float(lr),
time=self.iter_timer.average_time,
loss=self.smoothed_total_loss.GetMedianValue(),
eta=eta,
mb_qsize=int(
np.round(self.smoothed_mb_qsize.GetMedianValue())
),
mem=int(np.ceil(mem_usage / 1024 / 1024))
)
for k, v in self.smoothed_losses_and_metrics.items():
stats[k] = v.GetMedianValue()
log_json_stats(stats)
def main():
# Initialize C2
workspace.GlobalInit(
['caffe2', '--caffe2_log_level=0', '--caffe2_gpu_memory_tracking=1']
)
# Set up logging and load config options
logger = setup_logging(__name__)
logging.getLogger('roi_data.loader').setLevel(logging.INFO)
args = parse_args()
logger.info('Called with args:')
logger.info(args)
if args.cfg_file is not None:
merge_cfg_from_file(args.cfg_file)
if args.opts is not None:
merge_cfg_from_list(args.opts)
assert_and_infer_cfg()
logger.info('Training with config:')
logger.info(pprint.pformat(cfg))
# Note that while we set the numpy random seed network training will not be
# deterministic in general. There are sources of non-determinism that cannot
# be removed with a reasonble execution-speed tradeoff (such as certain
# non-deterministic cudnn functions).
np.random.seed(cfg.RNG_SEED)
# Execute the training run
checkpoints = train_model()
# Test the trained model
if not args.skip_test:
test_model(checkpoints['final'], args.multi_gpu_testing, args.opts)
def train_model():
"""Model training loop."""
logger = logging.getLogger(__name__)
model, start_iter, checkpoints, output_dir = create_model()
if 'final' in checkpoints:
# The final model was found in the output directory, so nothing to do
return checkpoints
setup_model_for_training(model, output_dir)
training_stats = TrainingStats(model)
CHECKPOINT_PERIOD = int(cfg.TRAIN.SNAPSHOT_ITERS / cfg.NUM_GPUS)
for cur_iter in range(start_iter, cfg.SOLVER.MAX_ITER):
training_stats.IterTic()
lr = model.UpdateWorkspaceLr(cur_iter)
workspace.RunNet(model.net.Proto().name)
if cur_iter == start_iter:
nu.print_net(model)
training_stats.IterToc()
training_stats.UpdateIterStats()
training_stats.LogIterStats(cur_iter, lr)
if (cur_iter + 1) % CHECKPOINT_PERIOD == 0 and cur_iter > start_iter:
checkpoints[cur_iter] = os.path.join(
output_dir, 'model_iter{}.pkl'.format(cur_iter)
)
nu.save_model_to_weights_file(checkpoints[cur_iter], model)
if cur_iter == start_iter + training_stats.LOG_PERIOD:
# Reset the iteration timer to remove outliers from the first few
# SGD iterations
training_stats.ResetIterTimer()
if np.isnan(training_stats.iter_total_loss):
logger.critical('Loss is NaN, exiting...')
model.roi_data_loader.shutdown()
envu.exit_on_error()
# Save the final model
checkpoints['final'] = os.path.join(output_dir, 'model_final.pkl')
nu.save_model_to_weights_file(checkpoints['final'], model)
# Shutdown data loading threads
model.roi_data_loader.shutdown()
return checkpoints
def create_model():
"""Build the model and look for saved model checkpoints in case we can
resume from one.
"""
logger = logging.getLogger(__name__)
start_iter = 0
checkpoints = {}
output_dir = get_output_dir(training=True)
if cfg.TRAIN.AUTO_RESUME:
# Check for the final model (indicates training already finished)
final_path = os.path.join(output_dir, 'model_final.pkl')
if os.path.exists(final_path):
logger.info('model_final.pkl exists; no need to train!')
return None, None, {'final': final_path}, output_dir
# Find the most recent checkpoint (highest iteration number)
files = os.listdir(output_dir)
for f in files:
iter_string = re.findall(r'(?<=model_iter)\d+(?=\.pkl)', f)
if len(iter_string) > 0:
checkpoint_iter = int(iter_string[0])
if checkpoint_iter > start_iter:
# Start one iteration immediately after the checkpoint iter
start_iter = checkpoint_iter + 1
resume_weights_file = f
if start_iter > 0:
# Override the initialization weights with the found checkpoint
cfg.TRAIN.WEIGHTS = os.path.join(output_dir, resume_weights_file)
logger.info(
'========> Resuming from checkpoint {} at start iter {}'.
format(cfg.TRAIN.WEIGHTS, start_iter)
)
logger.info('Building model: {}'.format(cfg.MODEL.TYPE))
model = model_builder.create(cfg.MODEL.TYPE, train=True)
if cfg.MEMONGER:
optimize_memory(model)
# Performs random weight initialization as defined by the model
workspace.RunNetOnce(model.param_init_net)
return model, start_iter, checkpoints, output_dir
def optimize_memory(model):
"""Save GPU memory through blob sharing."""
for device in range(cfg.NUM_GPUS):
namescope = 'gpu_{}/'.format(device)
losses = [namescope + l for l in model.losses]
model.net._net = memonger.share_grad_blobs(
model.net,
losses,
set(model.param_to_grad.values()),
namescope,
share_activations=cfg.MEMONGER_SHARE_ACTIVATIONS
)
def setup_model_for_training(model, output_dir):
"""Loaded saved weights and create the network in the C2 workspace."""
logger = logging.getLogger(__name__)
add_model_training_inputs(model)
if cfg.TRAIN.WEIGHTS:
# Override random weight initialization with weights from a saved model
nu.initialize_gpu_0_from_weights_file(model, cfg.TRAIN.WEIGHTS)
# Even if we're randomly initializing we still need to synchronize
# parameters across GPUs
nu.broadcast_parameters(model)
workspace.CreateNet(model.net)
logger.info('Outputs saved to: {:s}'.format(os.path.abspath(output_dir)))
dump_proto_files(model, output_dir)
# Start loading mini-batches and enqueuing blobs
model.roi_data_loader.register_sigint_handler()
model.roi_data_loader.start(prefill=True)
return output_dir
def add_model_training_inputs(model):
"""Load the training dataset and attach the training inputs to the model."""
logger = logging.getLogger(__name__)
logger.info('Loading dataset: {}'.format(cfg.TRAIN.DATASETS))
roidb = combined_roidb_for_training(
cfg.TRAIN.DATASETS, cfg.TRAIN.PROPOSAL_FILES
)
logger.info('{:d} roidb entries'.format(len(roidb)))
model_builder.add_training_inputs(model, roidb=roidb)
def dump_proto_files(model, output_dir):
"""Save prototxt descriptions of the training network and parameter
initialization network."""
with open(os.path.join(output_dir, 'net.pbtxt'), 'w') as fid:
fid.write(str(model.net.Proto()))
with open(os.path.join(output_dir, 'param_init_net.pbtxt'), 'w') as fid:
fid.write(str(model.param_init_net.Proto()))
def test_model(model_file, multi_gpu_testing, opts=None):
"""Test a model."""
# All arguments to inference functions are passed via cfg
cfg.TEST.WEIGHTS = model_file
# Clear memory before inference
workspace.ResetWorkspace()
# Run inference
test_net.main(multi_gpu_testing=multi_gpu_testing)
if __name__ == '__main__':
main()
#!/usr/bin/env python2
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""Script for visualizing results saved in a detections.pkl file."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import argparse
import cPickle as pickle
import cv2
import os
import sys
from datasets.json_dataset import JsonDataset
import utils.vis as vis_utils
# OpenCL may be enabled by default in OpenCV3; disable it because it's not
# thread safe and causes unwanted GPU memory allocations.
cv2.ocl.setUseOpenCL(False)
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
'--dataset',
dest='dataset',
help='dataset',
default='coco_2014_minival',
type=str
)
parser.add_argument(
'--detections',
dest='detections',
help='detections pkl file',
default='',
type=str
)
parser.add_argument(
'--thresh',
dest='thresh',
help='detection prob threshold',
default=0.9,
type=float
)
parser.add_argument(
'--output-dir',
dest='output_dir',
help='output directory',
default='./tmp/vis-output',
type=str
)
parser.add_argument(
'--first',
dest='first',
help='only visualize the first k images',
default=0,
type=int
)
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
args = parser.parse_args()
return args
def vis(dataset, detections_pkl, thresh, output_dir, limit=0):
ds = JsonDataset(dataset)
roidb = ds.get_roidb()
with open(detections_pkl, 'r') as f:
dets = pickle.load(f)
all_boxes = dets['all_boxes']
if 'all_segms' in dets:
all_segms = dets['all_segms']
else:
all_segms = None
if 'all_keyps' in dets:
all_keyps = dets['all_keyps']
else:
all_keyps = None
def id_or_index(ix, val):
if len(val) == 0:
return val
else:
return val[ix]
for ix, entry in enumerate(roidb):
if limit > 0 and ix >= limit:
break
if ix % 10 == 0:
print('{:d}/{:d}'.format(ix + 1, len(roidb)))
im = cv2.imread(entry['image'])
im_name = os.path.splitext(os.path.basename(entry['image']))[0]
cls_boxes_i = [
id_or_index(ix, all_boxes[j]) for j in range(len(all_boxes))
]
if all_segms is not None:
cls_segms_i = [
id_or_index(ix, all_segms[j]) for j in range(len(all_segms))
]
else:
cls_segms_i = None
if all_keyps is not None:
cls_keyps_i = [
id_or_index(ix, all_keyps[j]) for j in range(len(all_keyps))
]
else:
cls_keyps_i = None
vis_utils.vis_one_image(
im[:, :, ::-1],
'{:d}_{:s}'.format(ix, im_name),
os.path.join(output_dir, 'vis'),
cls_boxes_i,
segms=cls_segms_i,
keypoints=cls_keyps_i,
thresh=thresh,
box_alpha=0.8,
dataset=ds,
show_class=True
)
if __name__ == '__main__':
opts = parse_args()
vis(
opts.dataset,
opts.detections,
opts.thresh,
opts.output_dir,
limit=opts.first
)
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment