# RNN with simple dense function

Hi,
i tried to build my first layers and got first success. Now in my second step i want to create from the simple dens layer example for samediff a same functional layer for RNN input with NCW format.

the base unit test for simple feedforward was like this:

``````		INDArray     weights = Nd4j.arange(36).reshape(new int[] {6,6});
INDArray     value1  = Nd4j.arange(12).reshape(new int[] {2,6});
INDArray     bias    = Nd4j.ones(6);

SDVariable x1 = sd.var("x1",value1);
SDVariable w  = sd.var("weights",weights);
SDVariable b  = sd.var("bias",bias);

SDVariable out = x1.mmul(w);
``````

Now i struggle with the step one dimension up. I tried with the setup:

``````		INDArray     weights = Nd4j.arange(36).reshape(new int[] {6,6});
INDArray     value1  = Nd4j.arange(12).reshape(new int[] {2,6,1});
INDArray     bias    = Nd4j.ones(6);

SDVariable x1 = sd.var("x1",value1);
SDVariable w  = sd.var("weights",weights);
SDVariable b  = sd.var("bias",bias);

SDVariable out = sd.tensorMmul(x1, w, new int[] {1}, new int[] {0});
``````

but the shape ist 2,1,6 instead of 2,6,1. I don’t relly know if i have to

• preprocess to 2d
• use different dimenstions
• transpos in some way

If somebody could give me a snippet or hint would be appreciated.

Best regards

Thomas

@thomas could you post the full configuration? Are you trying to embed this in a a dl4j samediff layer? If so there are some overlapping techniques you could use depending on the layer. RNNs themselves are in samediff and it’s much closer to what we directly just run underneath the covers in the dl4j layers.

Hey,

thanks for your reply, my class looks like this at the moment:

``````public class RNNFeedForward extends SameDiffLayer {

/**
* serialization id
*/
private static final long serialVersionUID = -1615438632558936374L;

private int nIn;

private int nOut;

/**
* default constructor
*/
public RNNFeedForward(int nIn,int nOut) {
this.nIn  = nIn;
this.nOut = nOut;
}

@Override
public SDVariable defineLayer(SameDiff sd, SDVariable layerInput, Map<String, SDVariable> paramTable,
SDVariable weights = paramTable.get(DefaultParamInitializer.WEIGHT_KEY);
SDVariable bias    = paramTable.get(DefaultParamInitializer.BIAS_KEY);

SDVariable mmul = sd.transpose(sd.tensorMmul(layerInput, weights, new int[] {1}, new int[] {1}));
mmul.reshape(layerInput.shape());

return sd.nn.relu(a, 0);
}

@Override
public void defineParameters(SDLayerParams params) {
// dense layer parameter
}

@Override
public void initializeParameters(Map<String, INDArray> params) {
params.get(DefaultParamInitializer.BIAS_KEY).assign(0);
initWeights(nIn, nOut, weightInit, params.get(DefaultParamInitializer.WEIGHT_KEY));
}

@Override
public InputType getOutputType(int layerIndex, InputType inputType) {
return inputType;
}

@JsonProperty
public int getnIn() {
return nIn;
}

@JsonProperty
public void setnIn(int nIn) {
this.nIn = nIn;
}

@JsonProperty
public int getnOut() {
return nOut;
}

@JsonProperty
public void setnOut(int nOut) {
this.nOut = nOut;
}

}
``````

i try to use the same weight matrix for every timestep.

@thomas what about your rnn configuration? You can actually configure NCW/NWC and use both depending on what your expected ordering is.

That or you can just permute the result to what you expect to be the final output. Just make sure it’s consistent. In that case just use samediff.permute for your result.

Either way should be fine.

Thats more to be the second step, first i have to figure out how to correctly multiplay the weight with the input matrix to get the correct shape and values.

i tried with reshape and will check inside the layer later with:

``````		SameDiff sd = SameDiff.create();

int nIn = 2;
int nOut = 3;

int batch = 2;
int steps = 4;

INDArray     weights = Nd4j.ones(nIn * nOut).reshape(new int[] {nIn,nOut});
INDArray     value1  = Nd4j.arange(nIn * batch * steps).reshape(new int[] {batch,nIn,steps});
INDArray     bias    = Nd4j.ones(1,nOut,1);

SDVariable x1 = sd.var("x1",value1);
SDVariable w  = sd.var("weights",weights);
SDVariable b  = sd.var("bias",bias);

SDVariable out = sd.tensorMmul(sd.transpose(x1), w, new int[] {1}, new int[] {0});

out = out.permute(new int[] {1,2,0});
``````

thanks for the hint.

Ok i tried a model with a basic working iterator from other models and got the following error on back propagation:

Exception in thread “main” java.lang.RuntimeException: ShapeUtils::evalShapeForTensorDot method: the numbers of a axes and b axes to make dot product along must have identical values !
at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.exec(CudaExecutioner.java:2067)
at org.nd4j.linalg.factory.Nd4j.exec(Nd4j.java:6554)
at org.nd4j.autodiff.samediff.internal.InferenceSession.doExec(InferenceSession.java:801)
at org.nd4j.autodiff.samediff.internal.InferenceSession.getOutputs(InferenceSession.java:255)
at org.nd4j.autodiff.samediff.internal.InferenceSession.getOutputs(InferenceSession.java:68)
at org.nd4j.autodiff.samediff.internal.AbstractSession.output(AbstractSession.java:533)
at org.nd4j.autodiff.samediff.SameDiff.directExecHelper(SameDiff.java:2927)
at org.nd4j.autodiff.samediff.SameDiff.batchOutputHelper(SameDiff.java:2870)
at org.nd4j.autodiff.samediff.SameDiff.batchOutputHelper(SameDiff.java:2841)
at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doBackward(LayerVertex.java:148)

i removed all other components and reduced the calcuation in defineLayer to:

``````	@Override
public SDVariable defineLayer(SameDiff sd, SDVariable layerInput, Map<String, SDVariable> paramTable,
SDVariable weights = paramTable.get(DefaultParamInitializer.WEIGHT_KEY);

SDVariable activation = sd.tensorMmul(sd.transpose(layerInput), weights, new int[] {1}, new int[] {0});
SDVariable dimact     = activation.permute(new int[] {1,2,0});

return sd.nn.relu(dimact, 0);
}
``````

is there a possibility to debug the real input and output shapes when the graph is instantiated.

Best regards

Thomas

I looked for a hint inside your code and used the recurrent attention layer, this works. but i keep thinking there must be a better way to achive the goal:

``````	@Override
public SDVariable defineLayer(SameDiff sd, SDVariable layerInput, Map<String, SDVariable> paramTable,
SDVariable weights = paramTable.get(DefaultParamInitializer.WEIGHT_KEY);
SDVariable bias    = paramTable.get(DefaultParamInitializer.BIAS_KEY);
//SDVariable gain    = paramTable.get(DefaultParamInitializer.GAIN_KEY);

long[]             shape = layerInput.getShape();
SDVariable[] inputSlices = sd.unstack(layerInput, 2, (int)shape[2]);
int            timeSteps = inputSlices.length;

SDVariable[] outputSlices = new SDVariable[timeSteps];
for (int i=0;i<timeSteps;i++) {
outputSlices[i] = sd.expandDims(outputSlices[i], 2);
}
SDVariable out = sd.concat(2, outputSlices);

//out = sd.nn.layerNorm(out, gain, false, new int[] {1});

SDVariable relu = sd.nn.relu(out, 0);

return relu;
}

``````

if anybody has an suggestion how to achive by simple multiplication would be nice.

Best regards

Thomas

@thomas You can usually see an SDVariables shape with SDVariable.shape().eval(). I would recommend debugging during the executino/.output calls. You can put some miscellaneous debug code in there. eg: you could call
INDArray shape = someVar.shape().eval();

Usually you’ll need some dummy input though. Any graph that relies on placeholders needs some sort of structure to determine shapes. If you say: define a shape for a placeholder then it’s possible but in the context of the layer itself there’s no guarantees as to what the input shape is. The samediff graph here is also self contained relative to the rest of the network unfortunately. It would take some work (likely more than it’s worth) to improve that.

Regarding your attempts at tensor matrix multiply if you can extract out the parameters and dimensions you’re trying in a self contained environment where I don’t have to guess all your variables and I can just run it myself I"d be happy to look at the result and help you out.

All I know from your inputs there is that the tensor matrix multiply is failing due to some sort of invalid dimensions you’re specifying.

Beyond that, maybe you could also look at doing batch matrix multiply? That would allow input * x for each time slice to happen in parallel.

Hey thanks for your answer. I checked the dimension in a unit test but didn’t see any problems there. I will check a second time tomorrow.

The idea of batchMmul seems nice, i setup a unit test to check the operations and run into a problem i don’t really know if it is a bug (and seems to be fixed in snapshot).

``````Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.999 sec <<< FAILURE!
normalizeBatchLayerTest(simple.generator.test.AttentionTest)  Time elapsed: 2.963 sec  <<< ERROR!
java.lang.NullPointerException: Cannot load from long array because "firstShape" is null
at org.nd4j.linalg.api.ops.impl.reduce.custom.BatchMmul.<init>(BatchMmul.java:79)
at org.nd4j.linalg.api.ops.impl.reduce.custom.BatchMmul.<init>(BatchMmul.java:48)
at org.nd4j.autodiff.samediff.ops.SDBaseOps.batchMmul(SDBaseOps.java:384)
at simple.generator.test.AttentionTest.normalizeBatchLayerTest(AttentionTest.java:85)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:578)
at org.junit.runners.model.FrameworkMethod\$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
``````

The test looks like this:

``````		SameDiff sd = SameDiff.create();

int nIn = 2;
int nOut = 3;

int batch = 2;
int steps = 2;

INDArray     weights = Nd4j.ones(nIn * nOut).reshape(new int[] {nIn,nOut});
INDArray     value1  = Nd4j.arange(nIn * batch * steps).reshape(new int[] {batch,nIn,steps});

SDVariable x1 = sd.var("x1",value1);
SDVariable w  = sd.var("weights",weights);

long[] shape = x1.getShape();
SDVariable[] inputSlices = sd.unstack(x1, 2, (int)shape[2]);

SDVariable[] outputSlices = sd.batchMmul(inputSlices, new SDVariable[] {w,w});
``````

As i understand the documentation the arrays must both be of same length (here 2) and the dimensions must fit. so here previously checked by simple mmul loop. But the execption is also on a point line 79 where in the repository (master) only the begin of a constructor is. so i think if here is a bug fix already or don’t really know whats the cause of error. I checked both parameter and nothing is null here.

Best regards

Thomas

EDIT: it tried snapshot but they seemed to me out of date. repository sonatype is configured like in the documentation given.

@thomas ah yeah. Release will be going out soon so this shouldn’t be a big deal. I’ll publish snapshots by early next week after my last round of fixes gets merged.

Thanks for the info. I will check next week and will train now with the current model. finally my first complete transfomer block

@thomas great! Glad you were able to get up and running! Thanks for asking when you were stuck. I’ll work on publishing some better examples on transformers soon!

I tried to setup one first layer definition from my split classes for the different tasks, probably helps somebody (WARNING Not tested or sure if works). Only first steps and weight init not finshed and no debug … so use on own risk :

``````/**
* first try to implement a complete transfomer in one layer
*
* @author mrrobot
*
*/
public class TransformerLayer extends SameDiffLayer {

/**
* serialization id
*/
private static final long serialVersionUID = 5974498113062600619L;

// param names
private static final String WEIGHT_KEY_QUERY_PROJECTION = "Wq";
private static final String WEIGHT_KEY_KEY_PROJECTION = "Wk";
private static final String WEIGHT_KEY_VALUE_PROJECTION = "Wv";
private static final String WEIGHT_KEY_OUT_PROJECTION = "Wo";

private static final String WEIGHT_KEY_FFN = "Wffn";

private int embWidth;

private int seqLen;

/**
* builder constructor
*
* @param builder
*/
public TransformerLayer(Builder builder) {
embWidth = builder.embWidth;
seqLen   = builder.seqLen;
}

@Override
public SDVariable defineLayer(SameDiff sameDiff, SDVariable layerInput, Map<String, SDVariable> paramTable, SDVariable mask) {
SDVariable result;
SDVariable attention;

SDVariable Wffn  = paramTable.get(WEIGHT_KEY_FFN);

// first multi head attention dot product
SDVariable Wq = paramTable.get(WEIGHT_KEY_QUERY_PROJECTION);
SDVariable Wk = paramTable.get(WEIGHT_KEY_KEY_PROJECTION);
SDVariable Wv = paramTable.get(WEIGHT_KEY_VALUE_PROJECTION);
SDVariable Wo = paramTable.get(WEIGHT_KEY_OUT_PROJECTION);

attention = sameDiff.nn.multiHeadDotProductAttention(getLayerName(), layerInput, layerInput, layerInput, Wq, Wk, Wv, Wo, mask, true);
}else{
attention = sameDiff.nn.dotProductAttention(getLayerName(), layerInput, layerInput, layerInput, mask, true);
}

SDVariable norm1 = sameDiff.nn.layerNorm(add1, gain1, false, new int[] {1});

// ffn network part
long[]             shape = layerInput.getShape();
SDVariable[] inputSlices = sameDiff.unstack(layerInput, 2, (int)shape[2]);
int            timeSteps = inputSlices.length;

SDVariable[] outputSlices = new SDVariable[timeSteps];
for (int i=0;i<timeSteps;i++) {
outputSlices[i] = inputSlices[i].mmul(Wffn);
outputSlices[i] = sameDiff.expandDims(outputSlices[i], 2);
}
SDVariable ffnout = sameDiff.concat(2, outputSlices);

return result;
}

@Override
public void defineParameters(SDLayerParams params) {
params.clear();

// check for multi head attention parameter
}

// ffn parameters

// layer normalization parameter
}

@Override
public void initializeParameters(Map<String, INDArray> params) {
try (MemoryWorkspace ws = Nd4j.getWorkspaceManager().scopeOutOfWorkspaces()) {
for (Map.Entry<String, INDArray> e : params.entrySet()) {
if(e.getKey().equals(WEIGHT_KEY_OUT_PROJECTION)){
WeightInitUtil.initWeights(embWidth, headSize, e.getValue().shape(), weightInit, null, 'c', e.getValue());
}
}
}

}

/**
* ensure NCW input format to be compatible with attention implementation
*/
@Override
public InputPreProcessor getPreProcessorForInputType(InputType inputType) {
return InputTypeUtil.getPreprocessorForInputTypeRnnLayers(inputType, RNNFormat.NCW,getLayerName());
}

/**
* configure and info about output type  RNNFormat.NCW with seqLen and embWidth Size
*/
@Override
public InputType getOutputType(int layerIndex, InputType inputType) {
if (inputType == null || inputType.getType() != InputType.Type.RNN) {
throw new IllegalStateException("Invalid input for transformer layer (layer index = " + layerIndex
+ ", layer name = \"" + getLayerName() + "\"): expect RNN input type with size > 0. Got: "
+ inputType);
}

return InputType.recurrent(embWidth, seqLen);
}

/**
* util class to build a custom transformer layer
*/
public static class Builder {

public int embWidth;

public int seqLen;

public Builder embWidth(int width) {
embWidth = width;
return this;
}

return this;
}

public Builder seqLen(int len) {
seqLen = len;
return this;
}

public TransformerLayer builder() {
return new TransformerLayer(this);
}

}

}
``````

Hi,

i tried also to replace the given code with a second permute snippet that works in unit test (only eval no backprop) i got some error clean up on RNN Dataformat for the output layer but still get some errors:

``````	@Test
public void rnnAsFFNTest() {
SameDiff sd = SameDiff.create();

int nIn = 3;
int nOut = 5;

int batch = 2;
int steps = 4;

INDArray     weights = Nd4j.ones(nIn * nOut).reshape(new int[] {nIn,nOut});
INDArray     value1  = Nd4j.arange(nIn * batch * steps).reshape(new int[] {batch,nIn,steps});

SDVariable x1 = sd.var("x1",value1);
SDVariable w  = sd.var("weights",weights);

SDVariable xp = sd.permute(x1, new int[] {0,2,1});
SDVariable out = sd.tensorMmul(xp, w, new int[] {2}, new int [] {0});
SDVariable outp = sd.permute(out, new int[] {0,2,1});

System.out.println(x1.eval().shapeInfoToString());
System.out.println(outp.eval().shapeInfoToString());
}
``````

I thinks dimensions in general should fit. But got the same stacktrace:

``````[main] ERROR org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner - Failed to execute op tensormmul_bp. Attempted to execute with 3 inputs, 2 outputs, 0 targs,0 bargs and 4 iargs. Inputs: [(FLOAT,[16,80,128],c), (FLOAT,[128,128],c), (FLOAT,[16,80,128],c)]. Outputs: [(FLOAT,[16,80,128],c), (FLOAT,[128,128],c)]. tArgs: -. iArgs: [1, 1, 1, 2]. bArgs: -. Input var names: [permute, Wffn, tensordot-grad]. Output var names: [permute-grad, Wffn-grad] - Please see above message (printed out from c++) for a possible cause of error.
[main] WARN org.deeplearning4j.earlystopping.trainer.BaseEarlyStoppingTrainer - Early stopping training terminated due to exception at epoch 0, iteration 0
java.lang.RuntimeException: Op with name tensormmul_bp and op type [tensormmul_bp] execution failed with message ShapeUtils::evalShapeForTensorDot method: the dimensions at given axes for both input arrays must be the same !
at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:1905)
at org.nd4j.linalg.factory.Nd4j.exec(Nd4j.java:6531)
at org.nd4j.autodiff.samediff.internal.InferenceSession.doExec(InferenceSession.java:491)
at org.nd4j.autodiff.samediff.internal.InferenceSession.getOutputs(InferenceSession.java:218)
at org.nd4j.autodiff.samediff.internal.InferenceSession.getOutputs(InferenceSession.java:60)
at org.nd4j.autodiff.samediff.internal.AbstractSession.output(AbstractSession.java:391)
at org.nd4j.autodiff.samediff.SameDiff.directExecHelper(SameDiff.java:2754)
at org.nd4j.autodiff.samediff.SameDiff.batchOutputHelper(SameDiff.java:2722)
at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doBackward(LayerVertex.java:148)
``````

i found the most parameter permute and Wffn inside my code but no “tensordot-grad”. Can anybody explain where it comes frome?

@thomas
At the beginning of every samediff method call you can specify a name of a variable. Feel free to do that to make it a little more debuggable.

it looks like there’s some mismatch. a “-grad” variable is just a variable that gets created during training when attempting to do backprop. It stands for “gradient of the variable that proceeds the -”

From there it looks like you’re still misusing tensor matrix multiply. The error is right there:

``````Failed to execute op tensormmul_bp. Attempted to execute with 3 inputs, 2 outputs, 0 targs,0 bargs and 4 iargs. Inputs: [(FLOAT,[16,80,128],c), (FLOAT,[128,128],c), (FLOAT,[16,80,128],c)]. Outputs: [(FLOAT,[16,80,128],c), (FLOAT,[128,128],c)]. tArgs: -. iArgs: [1, 1, 1, 2]. bArgs: -. Input var names: [permute, Wffn, tensordot-grad]. Output var names: [permute-grad, Wffn-grad] - Please see above message (printed out from c++) for a possible cause of error.
``````

In that case the variable output from tensor matrix multiply needs to be the same as the variable it’s the -grad of.