-
Davis King authored
concat layer's backward() method. It was assigning the gradient to previous layers instead of adding the gradient, as required by the layer interface specification. This change also noticeably speeds up concat layers since only one CUDA kernel launch now happens per concat operation, rather than one kernel launch for each sample in a tensor.
7078cfaf