Bidirectional layer and its output

vahed-p · August 26, 2020, 7:58pm

Hi
I have a network with an LSTM layer wrapped by Bidirectional wrapper.
during training the network learns very well and the score converges to zero.
but when i test the learned network to get predictions, by using the output() method, it works wrong.
it seems output() isn’t work like rnntimestep() (i tested it with a network without bidirectional layer)
So, as we can’t rnntimestep bidirectional layers, how we can get correct output from the net?

its really appreciable if someone give me any advise.

vahed-p · August 27, 2020, 1:49pm

@treo @agibsonccc @AlexBlack
please help me, I really need your help

agibsonccc · August 28, 2020, 11:27am

@vahed-p could you post code to run? Sorry it’s hard to respond to questions when we just get something like: “guess my problem by reading my mind :)” it’s just not a lot to go off of.

vahed-p · August 28, 2020, 10:38pm

Ok. thanks for response and sorry for confusing you.
here is my network configuration:

ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
.updater(new Adam().builder().learningRate(0.01).build())
.seed(123)
.miniBatch(false)
.biasInit(0)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.weightInit(WeightInit.XAVIER_UNIFORM)
.graphBuilder()
.addInputs(“input”)
.addLayer(“L1”, new Bidirectional(new LSTM.Builder().nIn(5).nOut(lstmLayerSize).updater(new Adam().builder().learningRate(0.008).build()).activation(Activation.TANH).build()), “input”)
.addLayer(“L2”, new DenseLayer.Builder().nIn(lstmLayerSize*2).nOut(lstmLayerSize).updater(new Adam().builder().learningRate(0.01).build()).activation(Activation.TANH).build(),“L1”)
.addLayer(“L3”,new RnnOutputLayer.Builder().nIn(lstmLayerSize).nOut(iter.inputColumns()).updater(new Adam().builder().learningRate(0.01).build()).build(), “L2”)
.setOutputs(“L3”)
.inputPreProcessor(“L2”, new RnnToFeedForwardPreProcessor())
.inputPreProcessor(“L3”, new FeedForwardToRnnPreProcessor())
.build();

vahed-p · August 28, 2020, 10:47pm

here is the function to make predictions to rebuild sequences of characters.

private String sampleCharactersFromNetwork_REVERSE(ComputationGraph net,
CharacterIterator iter,
Random rng,
int charactersToSample,
int initialstringslength) {

   StringBuilder[] sb = new StringBuilder[nSamplesToGenerate];
    for (int i = 0; i < nSamplesToGenerate; i++) {
        sb[i] = new StringBuilder(initialstrings[i].substring(initialstrings[0].length() - initialstringslength, initialstrings[0].length()));
        sb[i].reverse();
    }
    
    int iterations =0;
    while (initilstringsOFFSETs.size()>0)
    {
        INDArray initializationInput = CreateCharacterInit_REVERSE(iter,initialstringslength);
        net.rnnClearPreviousState();
        INDArray[] output1 = net.output(initializationInput);
        INDArray output = output1[0].tensorAlongDimension((int) (output1[0].size(2) - 1), 1, 0);    
        for (int i = 0; i < (charactersToSample - initialstringslength); i++) {
            INDArray nextInput = Nd4j.zeros(characterminiBatchSize, iter.inputColumns(),1);
            
            for (int s = iterations*characterminiBatchSize; s < (iterations+1)*characterminiBatchSize; s++)
            {
                double[] outputProbDistribution = new double[iter.totalOutcomes()];
                for (int j = 0; j < outputProbDistribution.length; j++)
                    outputProbDistribution[j] = output.getDouble((s-iterations*characterminiBatchSize), j);
                int sampledCharacterIdx =
                        sampleFromDistribution(outputProbDistribution, rng);
                //	Prepare	next	time	step	input
                nextInput.putScalar(new int[]{(s-iterations*characterminiBatchSize), sampledCharacterIdx,1}, 1.0f);
                //Add	sampled	character	to	StringBuilder	(human	readable	output)
                sb[s].append(iter.convertIndexToCharacter(sampledCharacterIdx));
            }

            output1 = net.output(nextInput);
            output = output1[0];
        }
        iterations++;
    }

    String[] out = new String[nSamplesToGenerate];
    for (int i = 0; i < nSamplesToGenerate; i++) out[i] = sb[i].reverse().toString();
    return out;
}

vahed-p · August 28, 2020, 11:03pm

in training phase, network learns very well.
but when sampling from the learned network with the above code (using output( ) method) i get bad predictions.

but using a network without Bidirectional layer and sampling by rnntimestep( ) method, i get more accurate predictions.

Now, problem is: i have to use bidirectional layer because it learns much better, but i can’t rnntimestep it; and unfortunately output( ) method doesn’t work as well as rnntimestep. So what should i do?

agibsonccc · August 29, 2020, 12:33pm

Since you’re doing character generation, could you give me a complete example that verifies what you consider “accurate”? Generated text to some point is pretty subjective. From what I gathered, you’re modifying the character generation dl4j example, so now I just need to understand the difference then I can give some pointers.

vahed-p · August 29, 2020, 8:17pm

Yes, i want to generate sequence of symbols.
for example i give the sequence “CCAACTCTCAAGAAGACCTTACCTTACCAGCTTCCTTAAAGTCTGTCGACGACCTACAACATTTCTTGTTAA” to the network in training phase. network learns it very well (the score converges to zero).
then, in sampling phase, i give the learned network an initial subsequence “CCAACTCTCAAGAAGACCTT” and use the predictions to reconstruct the original sequence but i get the sequence “Yes, i want to generate sequence of symbols.
for example i give the sequence “CCAACTCTCAAGAAGACCTTACCTTACCAGCTTCCTTAAAGTCTGTCGACGACCTACAACATTTCTTGTTAA” to the network in training phase. network learns it very well (the score converges to zero).
then, in sampling phase, i give the learned network an initial subsequence “CCAACTCTCAAGAAGACCTT” and use the predictions to reconstruct the original sequence but i get the sequence “CCAACTCTCAAGAAGACCTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGAAAAAAAAAAAA””

vahed-p · August 29, 2020, 9:02pm

But, when i remove the bidirectional, and use the rnntimestep() method for sampling, it reconstruct the original sequence exactly. but the point is that the number of sequences are more and it doesn’t work well for all of them, because in this configuration the network doesn’t learn as good as the Bidirectional network.

agibsonccc · August 30, 2020, 5:25am

@vahed-p could you give me something complete I could look at preferably with an eval function? The subjective nature here is exactly why I’m trying to drill for something more specific on what you’re hoping for. I can’t compare with the existing nets either (there’s no training I can look at to see how the network changes or anything)

You’re making it a bit hard for me to help you. I don’t want to have reverse engineer half your code from small descriptions here. I’d at least like something I can run even if you just DM it to me at least.

vahed-p · August 30, 2020, 1:08pm

ok. thanks again. i will send you my complete code to run.

Topic		Replies	Views
LSTM sequence prediction fail (( DL4J	1	278	February 21, 2023
LSTM - All inputs yield the same output DL4J	1	301	February 1, 2023
What method should i use for a time series prediction? DL4J	0	351	December 29, 2020
Bidirectional LSTM in DL4J based on python example DL4J	11	1663	May 21, 2020
RuntimeException in Bidirectional trainning DL4J	3	376	November 22, 2020

Bidirectional layer and its output

Related topics