## Training a neural network (R) [duplicate]

This question already has answers here: What should I do when my neural network doesn’t learn? (8 answers) Closed 2 years ago. I’m working on a neural network with one hidden layer. So far I’ve implemented the algorithm, and I’ve been able to numerically verify the partial derivatives I get from back propagation. My problem … Read more

## Computing Training Set Perplexity of a Neural Language Model: Too low values

I am implementing a Language Model based on a Deep Learning architecture (RNN+Softmax). The cost function I am using is the cross-entropy between the vector of probabilities at the softmax layer and the one-hot vector of the target word to predict. For every epoch, I am computing the perplexity as: where is the number of … Read more

## Neural network regression with unit interval target

As part of a multi-task prediction problem I’m building a neural network for a regression problem where the target/response value lies on the unit interval, i.e. is a real value between 0 and 1. Right now, I’m using a sigmoid output layer (logistic function) to bound the output to the unit interval, but I’m not … Read more

## What does the class_weight function in keras do during training of Neural Networks?

I have a heavily imbalanced dataset with 170 columns and 2 million rows, there are also missing data in the set. As practiced, I drop all the null values, normalized the data using min-max method and performed different techniques to address the imbalance. I tried random oversampling, random undersampling, SMOTE, SMOTE-Tomek and SMOTE-ENN, along with … Read more

## Loss function to maximize sum of targets

I have a dataset {$X_i$, $Y_i$}. $Y_i$ – real value targets, they can be negative. The task is to teach classifier $f: X_i \rightarrow \{0, 1\}$ to maximize the sum below $$L(f) = \sum_i f(X_i) * Y_i$$ So $f(X_i) = I[Y_i > 0]$ would be the ideal classifier. I’m looking for appropriate smooth … Read more

## Can a neural network learn a functional, and its functional derivative?

I understand that neural networks (NNs) can be considered universal approximators to both functions and their derivatives, under certain assumptions (on both the network and the function to approximate). In fact, I have done a number of tests on simple, yet non-trivial functions (e.g., polynomials), and it seems that I can indeed approximate them and … Read more

## What is the advantage of using BPTT along with teacher forcing?

In section 10.2.1 of deep learning book (available at deeplearningbook.org), the authors mentioned that when we have both hidden-to-hidden and output-to-hidden feedbacks in recurrent neural networks, we can use both back propagation through time (BPTT) and teacher forcing learning methods. I think the main advantage of teacher forcing is to parallelize training of different time … Read more

## Proper loss function

I’m working with a dataset dealing with product sales in a supermarket. I’m using a large neural network (with large I mean I have many inputs compared to the number of training instances) to forecast the sales, but I would like to reduce the input dimension, using an autoencoder. Let me describe the input data … Read more

## Comparing different small sample sizes

I sadly don´t know alot about statistics that´s why I hope you could help me. I´m currently comparing two different configurations of a neural network. My datapoints are the standard deviations of the Neuron activities in the hidden layer after a particular set of inputs. Configuration A has only 3 neurons in the hidden layer … Read more

## Batch normalization – how to compute mean and standard deviation

I am trying to understand how batch normalization works. If I understand it correctly, when using batch normalization in a certain layer, the activations of all units/neurons in that layer are normalized to having zero mean and unit variance. Is that correct? If so, how do I calculate the mean and covariance? As a main … Read more