Training a neural network (R) [duplicate]

This question already has answers here: What should I do when my neural network doesn’t learn? (8 answers) Closed 2 years ago. I’m working on a neural network with one hidden layer. So far I’ve implemented the algorithm, and I’ve been able to numerically verify the partial derivatives I get from back propagation. My problem … Read more

Computing Training Set Perplexity of a Neural Language Model: Too low values

I am implementing a Language Model based on a Deep Learning architecture (RNN+Softmax). The cost function I am using is the cross-entropy between the vector of probabilities at the softmax layer and the one-hot vector of the target word to predict. For every epoch, I am computing the perplexity as: where is the number of … Read more

Neural network regression with unit interval target

As part of a multi-task prediction problem I’m building a neural network for a regression problem where the target/response value lies on the unit interval, i.e. is a real value between 0 and 1. Right now, I’m using a sigmoid output layer (logistic function) to bound the output to the unit interval, but I’m not … Read more

What does the class_weight function in keras do during training of Neural Networks?

I have a heavily imbalanced dataset with 170 columns and 2 million rows, there are also missing data in the set. As practiced, I drop all the null values, normalized the data using min-max method and performed different techniques to address the imbalance. I tried random oversampling, random undersampling, SMOTE, SMOTE-Tomek and SMOTE-ENN, along with … Read more

Loss function to maximize sum of targets

I have a dataset {$X_i$, $Y_i$}. $Y_i$ – real value targets, they can be negative. The task is to teach classifier $f: X_i \rightarrow \{0, 1\}$ to maximize the sum below $$ L(f) = \sum_i f(X_i) * Y_i $$ So $f(X_i) = I[Y_i > 0]$ would be the ideal classifier. I’m looking for appropriate smooth … Read more

Can a neural network learn a functional, and its functional derivative?

I understand that neural networks (NNs) can be considered universal approximators to both functions and their derivatives, under certain assumptions (on both the network and the function to approximate). In fact, I have done a number of tests on simple, yet non-trivial functions (e.g., polynomials), and it seems that I can indeed approximate them and … Read more

What is the advantage of using BPTT along with teacher forcing?

In section 10.2.1 of deep learning book (available at, the authors mentioned that when we have both hidden-to-hidden and output-to-hidden feedbacks in recurrent neural networks, we can use both back propagation through time (BPTT) and teacher forcing learning methods. I think the main advantage of teacher forcing is to parallelize training of different time … Read more

Proper loss function

I’m working with a dataset dealing with product sales in a supermarket. I’m using a large neural network (with large I mean I have many inputs compared to the number of training instances) to forecast the sales, but I would like to reduce the input dimension, using an autoencoder. Let me describe the input data … Read more

Batch normalization – how to compute mean and standard deviation

I am trying to understand how batch normalization works. If I understand it correctly, when using batch normalization in a certain layer, the activations of all units/neurons in that layer are normalized to having zero mean and unit variance. Is that correct? If so, how do I calculate the mean and covariance? As a main … Read more