Training a neural network (R) [duplicate]

This question already has answers here: What should I do when my neural network doesn’t learn? (8 answers) Closed 2 years ago. I’m working on a neural network with one hidden layer. So far I’ve implemented the algorithm, and I’ve been able to numerically verify the partial derivatives I get from back propagation. My problem … Read more

Computing Training Set Perplexity of a Neural Language Model: Too low values

I am implementing a Language Model based on a Deep Learning architecture (RNN+Softmax). The cost function I am using is the cross-entropy between the vector of probabilities at the softmax layer and the one-hot vector of the target word to predict. For every epoch, I am computing the perplexity as: where is the number of … Read more

Neural network regression with unit interval target

As part of a multi-task prediction problem I’m building a neural network for a regression problem where the target/response value lies on the unit interval, i.e. is a real value between 0 and 1. Right now, I’m using a sigmoid output layer (logistic function) to bound the output to the unit interval, but I’m not … Read more

What does the class_weight function in keras do during training of Neural Networks?

I have a heavily imbalanced dataset with 170 columns and 2 million rows, there are also missing data in the set. As practiced, I drop all the null values, normalized the data using min-max method and performed different techniques to address the imbalance. I tried random oversampling, random undersampling, SMOTE, SMOTE-Tomek and SMOTE-ENN, along with … Read more

Convolutional Neural Networks regularization

I’m working on CNN and I have a question about regularization. Max-norm constraints (a form of gradient clipping) apply a rescale of the weight vector that satisfies |W|2<c, normally in the layer where Dropout is applied. If we apply a Lagrange multiplier to the function that satisfies this constraint we have +λ (||W||_2-c) that is … Read more

Loss function to maximize sum of targets

I have a dataset {$X_i$, $Y_i$}. $Y_i$ – real value targets, they can be negative. The task is to teach classifier $f: X_i \rightarrow \{0, 1\}$ to maximize the sum below $$ L(f) = \sum_i f(X_i) * Y_i $$ So $f(X_i) = I[Y_i > 0]$ would be the ideal classifier. I’m looking for appropriate smooth … Read more

Can a neural network learn a functional, and its functional derivative?

I understand that neural networks (NNs) can be considered universal approximators to both functions and their derivatives, under certain assumptions (on both the network and the function to approximate). In fact, I have done a number of tests on simple, yet non-trivial functions (e.g., polynomials), and it seems that I can indeed approximate them and … Read more

What is the advantage of using BPTT along with teacher forcing?

In section 10.2.1 of deep learning book (available at deeplearningbook.org), the authors mentioned that when we have both hidden-to-hidden and output-to-hidden feedbacks in recurrent neural networks, we can use both back propagation through time (BPTT) and teacher forcing learning methods. I think the main advantage of teacher forcing is to parallelize training of different time … Read more

Batch normalization – how to compute mean and standard deviation

I am trying to understand how batch normalization works. If I understand it correctly, when using batch normalization in a certain layer, the activations of all units/neurons in that layer are normalized to having zero mean and unit variance. Is that correct? If so, how do I calculate the mean and covariance? As a main … Read more

Most common method for deciding when to stop training a neural net on a batch

I have created my own neural net which is using batch gradient descent. In other words, it trains on batches of examples all at once. My issue is trying to figure out when to stop the training of the batch. I’ll try to make things as understandable as possible since there are so many options, … Read more