## Would you interpret this image as a correct deconvolution process?

I’m applying the function conv2d_grad_wrt_inputs in theano to deconv a feature map into the original image. In the figure below the first image to the left is the input image which I’m applying convolution and the result is the feature map in the second image while the third image is the deconvolution process. I’m applying … Read more

## Problems understanding “equivariance to translation” example in deep learning book by Goodfellow et al

I am trying to understand the following part about equivariance to translation from the deep learning book by Goodfellow, Bengio and Courville (chapter 9.2, page 338-339): To say a function is equivariant means that if the input changes, the output changes in the same way. Specifically, a function f(x) is equivariant to a function g … Read more

## Trying to build a model to suggest related questions

I am trying to build a model which can recommend/predict a followup question based on the previous question. Like “Did you attend the college?” a followup question may be “What was your major at college?” I am trying to build a deep learning model for the same. A approach that seems possible is using “Seq2Seq” … Read more

## How would you interpret decreasing cost but increasing training and validation error during epochs?

I’m training a fully convolution network with final layer of global sum pooling and no intermediate pooling in the network which I already test on global average pooling and it converged very well. The reason I’m testing the global sum is to penalize harder the activation map and make them only sensitive to the actual … Read more

## Difficulties to make relatively shallow deep nets converge with SmoothL1, L1 or L2loss

I want to do bbox-regression as in the rcnn paper which comes down to predicting scalar values by minimizing a smoothed L1 loss. I have to predict relatively small offsets (a fex pixels) in around 20000 patches containing centered objects or their shifted versions. Girshick’s parametrization is: (offsetx,offsety)=(Δxwidth,Δyheight) So if my patches are 45×45 and … Read more

## Batch normalization – how to compute mean and standard deviation

I am trying to understand how batch normalization works. If I understand it correctly, when using batch normalization in a certain layer, the activations of all units/neurons in that layer are normalized to having zero mean and unit variance. Is that correct? If so, how do I calculate the mean and covariance? As a main … Read more

## Deep Learning Book – deriving sigmoid units for Bernoulli output [duplicate]

This question already has answers here: Motivating sigmoid output units in neural networks starting with unnormalized log probabilities linear in z=wTh+b and ϕ(z) (4 answers) Closed 3 years ago. In the paragraph before equation 6.20, the book says: “…If we begin with the assumption that the unnormalized log probabilities are linear in y and z, … Read more

## How to get continuous output with Convolutional network in Keras?

I’m a new user in using convolutional neural networks with keras. I have a code to classify set of images into 2 classes [0,1] using CNN in keras but I need to convert this code to get continuous output (linear regression,…) in keras. Could you explain this for me? here is my code: # input … Read more

## What is the difference between ‘regular’ linear regression and deep learning linear regression?

I want to know the difference between linear regression in a regular machine learning analysis and linear regression in “deep learning” setting. What algorithms are used for linear regression in deep learning setting. Answer Assuming that by deep learning you meant more precisely neural networks: a vanilla fully connected feedforward neural network with only linear … Read more

## What’s the name of this recurrent neural network?

I remember recently seeing or reading a paper about a new type of recurrent neural network that enabled long term memory over sequences by having only part of the neurons active at any given timestep. The timestep unrolled connections looked something like this (possibly not exactly like this) This was done to model time series, … Read more