I’m training a LSTM for a sequence classification task (examples of 30 timesteps, input mask active for all but last step, output mask with only last step active).
Training goes well, and so does prediction on a testset of other 30-length examples.
Then I want to use it on a continuous stream of live inputs (using rnnTimeStep), and I expect the output at each step T to be équivalent to what I would have if running the network on the sequence [T-30…T]. But instead, the network seems to behave as expected for the first 30-ish timesteps, and then reaches a stable state and is no longer sensitive to recent data. As if the inner state of the network grows to have more weight than the recent input.
Is such continuous prediction possible ?