Hi
I have a network with an LSTM layer wrapped by Bidirectional wrapper.
during training the network learns very well and the score converges to zero.
but when i test the learned network to get predictions, by using the output() method, it works wrong.
it seems output() isn’t work like rnntimestep() (i tested it with a network without bidirectional layer)
So, as we can’t rnntimestep bidirectional layers, how we can get correct output from the net?
its really appreciable if someone give me any advise.
@vahed-p could you post code to run? Sorry it’s hard to respond to questions when we just get something like: “guess my problem by reading my mind :)” it’s just not a lot to go off of.
here is the function to make predictions to rebuild sequences of characters.
private String sampleCharactersFromNetwork_REVERSE(ComputationGraph net,
CharacterIterator iter,
Random rng,
int charactersToSample,
int initialstringslength) {
StringBuilder[] sb = new StringBuilder[nSamplesToGenerate];
for (int i = 0; i < nSamplesToGenerate; i++) {
sb[i] = new StringBuilder(initialstrings[i].substring(initialstrings[0].length() - initialstringslength, initialstrings[0].length()));
sb[i].reverse();
}
int iterations =0;
while (initilstringsOFFSETs.size()>0)
{
INDArray initializationInput = CreateCharacterInit_REVERSE(iter,initialstringslength);
net.rnnClearPreviousState();
INDArray[] output1 = net.output(initializationInput);
INDArray output = output1[0].tensorAlongDimension((int) (output1[0].size(2) - 1), 1, 0);
for (int i = 0; i < (charactersToSample - initialstringslength); i++) {
INDArray nextInput = Nd4j.zeros(characterminiBatchSize, iter.inputColumns(),1);
for (int s = iterations*characterminiBatchSize; s < (iterations+1)*characterminiBatchSize; s++)
{
double[] outputProbDistribution = new double[iter.totalOutcomes()];
for (int j = 0; j < outputProbDistribution.length; j++)
outputProbDistribution[j] = output.getDouble((s-iterations*characterminiBatchSize), j);
int sampledCharacterIdx =
sampleFromDistribution(outputProbDistribution, rng);
// Prepare next time step input
nextInput.putScalar(new int[]{(s-iterations*characterminiBatchSize), sampledCharacterIdx,1}, 1.0f);
//Add sampled character to StringBuilder (human readable output)
sb[s].append(iter.convertIndexToCharacter(sampledCharacterIdx));
}
output1 = net.output(nextInput);
output = output1[0];
}
iterations++;
}
String[] out = new String[nSamplesToGenerate];
for (int i = 0; i < nSamplesToGenerate; i++) out[i] = sb[i].reverse().toString();
return out;
}
in training phase, network learns very well.
but when sampling from the learned network with the above code (using output( ) method) i get bad predictions.
but using a network without Bidirectional layer and sampling by rnntimestep( ) method, i get more accurate predictions.
Now, problem is: i have to use bidirectional layer because it learns much better, but i can’t rnntimestep it; and unfortunately output( ) method doesn’t work as well as rnntimestep. So what should i do?
Since you’re doing character generation, could you give me a complete example that verifies what you consider “accurate”? Generated text to some point is pretty subjective. From what I gathered, you’re modifying the character generation dl4j example, so now I just need to understand the difference then I can give some pointers.
Yes, i want to generate sequence of symbols.
for example i give the sequence “CCAACTCTCAAGAAGACCTTACCTTACCAGCTTCCTTAAAGTCTGTCGACGACCTACAACATTTCTTGTTAA” to the network in training phase. network learns it very well (the score converges to zero).
then, in sampling phase, i give the learned network an initial subsequence “CCAACTCTCAAGAAGACCTT” and use the predictions to reconstruct the original sequence but i get the sequence “Yes, i want to generate sequence of symbols.
for example i give the sequence “CCAACTCTCAAGAAGACCTTACCTTACCAGCTTCCTTAAAGTCTGTCGACGACCTACAACATTTCTTGTTAA” to the network in training phase. network learns it very well (the score converges to zero).
then, in sampling phase, i give the learned network an initial subsequence “CCAACTCTCAAGAAGACCTT” and use the predictions to reconstruct the original sequence but i get the sequence “CCAACTCTCAAGAAGACCTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGAAAAAAAAAAAA””
But, when i remove the bidirectional, and use the rnntimestep() method for sampling, it reconstruct the original sequence exactly. but the point is that the number of sequences are more and it doesn’t work well for all of them, because in this configuration the network doesn’t learn as good as the Bidirectional network.
@vahed-p could you give me something complete I could look at preferably with an eval function? The subjective nature here is exactly why I’m trying to drill for something more specific on what you’re hoping for. I can’t compare with the existing nets either (there’s no training I can look at to see how the network changes or anything)
You’re making it a bit hard for me to help you. I don’t want to have reverse engineer half your code from small descriptions here. I’d at least like something I can run even if you just DM it to me at least.