When to “add” layers and when to “concatenate” in neural networks?

I am using “add” and “concatenate” as it is defined in keras. Basically, from my understanding, add will sum the inputs (which are the layers, in essence tensors). So if the first layer had a particular weight as 0.4 and another layer with the same exact shape had the corresponding weight being 0.5, then after the add the new weight becomes 0.9.

However, with concatenate, let’s say the first layer has dimensions 64x128x128 and the second layer had dimensions 32x128x128, then after concatenate, the new dimensions are 96x128128 (assuming you pass in the second layer as the first input into concatenate).

Assuming my above intuition is true, when would I use one over the other? Conceptually, add seems a sharing of information that potentially results in information distortion while concatenate is a sharing of information in the literal sense.

Answer

Adding is nice if you want to interpret one of the inputs as a residual “correction” or “delta” to the other input. For example, the residual connections in ResNet are often interpreted as successively refining the feature maps. Concatenating may be more natural if the two inputs aren’t very closely related. However, the difference is smaller than you may think.

Note that W[x,y] = W_1x + W_2y where [\ ] denotes concat and W is split horizontally into W_1 and W_2. Compare this to W(x+y) = Wx + Wy. So you can interpret adding as a form of concatenation where the two halves of the weight matrix are constrained to W_1 = W_2.

Attribution
Source : Link , Question Author : Christian , Answer Author : shimao

Leave a Comment