Understanding LSTM topology

As many others have, I found the resources here and here to be immensely useful for understanding LSTM cells. I am confident I understand how values flow and are updated, and I’m confident enough to add the mentioned “peephole connections”, etc. also.

In my example, I have at each time step an input vector of length i and an output vector of length o, where o < i.

What neither page really covered is how these are arranged and trained.

I have 2 questions:

  1. In my training data, I have a lot of input/output vector pairs corresponding to many, many time units. Suppose I train the LSTM with all the data. Can I then run an arbitrary length input set through it? What I mean is, if I have training data for, say, the whole of 2015 and 2016, can I then run data through the network for 2017? Or perhaps 2017 through 2020?
  2. According to what I’ve read, it feels like I have one LSTM cell per time unit, so if I have many time units then I have many chained LSTM cells. Since the length of the chain is dependent on the length of data I want to run through the network, and that is presumably arbitrary, I cannot see how I would train this, unless I only train a single LSTM cell which is then duplicated a number times. So it seems like I would train a single LSTM cell, and then chain n of them together for a given input vector list of length n? Even though a single LSTM cell contains a number of elements and functions, it feels like it’s not enough to capture so much information in something so small?

Thanks. Are there any other resources I can consume (relatively quickly) that will help me understand the details of implementation? The 2 links above gave a fantastic high-level picture of what’s going on but fail to capture these finer details.

Answer

Suppose I train the LSTM with all the data. Can I then run an arbitrary length input set through it?

Abstractly, yes. However, some software implementations have hard rules about whether or not variables need to be a fixed size, or if they can be variable sizes, so in terms of programming, you’ll have to check that you’re implementing things correctly.

So it seems like I would train a single LSTM cell, and then chain n of them together for a given input vector list of length n?

No. Each cell processes all time-units. That’s what makes them recurrent: the cell processes an input $x_t$ by updating the cell’s memory state. The next time unit is a function of the previous memory state and the new input $x_{t+1}$.

Attribution
Source : Link , Question Author : AKrip4k , Answer Author : Sycorax

Leave a Comment