"Attention Is All You Need" model implementation using dl4j

Executing op: [subtract]
About to get variable in execute output
node_1:0 result shape: [1, 6]; dtype: FLOAT; first values [-0.000190735, 9.53674e-05, 5.97688e+07, 0.285804, 0, -0.0568358]
Executing op: [add]
About to get variable in execute output
node_1:0 result shape: [1, 6]; dtype: FLOAT; first values [3.71111e-07, 8.02352e-06, 1.4324e+17, 1364.42, 0, 28.5823]
Executing op: [add]
About to get variable in execute output
node_1:0 result shape: [1, 6]; dtype: FLOAT; first values [4.22407e-06, 8.98676e-06, 5.21581e+17, 1373.07, 0, 28.9244]
Executing op: [add]
About to get variable in execute output
node_1:0 result shape: [1, 6]; dtype: FLOAT; first values [45.7536, 8.31221, 1.67806e+12, 1.63526, 0, 1.93305]
Executing op: [subtract]
About to get variable in execute output
node_1:0 result shape: [1, 2]; dtype: FLOAT; first values [-1.14441e-05, 6.77109e-05]
Executing op: [add]
About to get variable in execute output
node_1:0 result shape: [1, 2]; dtype: FLOAT; first values [2.25376e-07, 2.35602e-06]
Executing op: [add]
About to get variable in execute output
node_1:0 result shape: [1, 2]; dtype: FLOAT; first values [2.27472e-07, 2.42937e-06]
Executing op: [add]
About to get variable in execute output
node_1:0 result shape: [1, 2]; dtype: FLOAT; first values [45.7535, 8.31219]

Executing op: [add]
About to get variable in execute output
node_1:0 result shape: [1, 6]; dtype: FLOAT; first values [45.7534, 8.31222, 1.67966e+12, 1.65916, 5.76711, 1.9537]
Executing op: [subtract]
About to get variable in execute output
node_1:0 result shape: [1, 2]; dtype: FLOAT; first values [-5.72205e-05, -4.00543e-05]
Executing op: [add]
About to get variable in execute output
node_1:0 result shape: [1, 2]; dtype: FLOAT; first values [1.90246e-05, 6.33076e-05]
Executing op: [add]
About to get variable in execute output
node_1:0 result shape: [1, 2]; dtype: FLOAT; first values [1.90959e-05, 6.33426e-05]
Executing op: [add]
About to get variable in execute output
node_1:0 result shape: [1, 2]; dtype: FLOAT; first values [45.7535, 8.3122]
Printing traindata dataset shape - 1
[32, 6, 57]
Printing testdata dataset shape - 1
[32, 6, 13]
in.toString() - SDVariable(name=“input”,variableType=PLACEHOLDER,dtype=FLOAT,shape=[32, 6, -1])
label.toString() - SDVariable(name=“label”,variableType=PLACEHOLDER,dtype=FLOAT,shape=[32, 2, -1])
======================================================= -
weights.getShapeDescriptor().toString() - [3,32, 6, 2,12, 2, 1,8192,1,c]
======================================================= -
bias.toString() - 0
Printing sd information - 0 -
SameDiff(nVars=9,nOps=5)
— Summary —
Variables: 9 (2 with arrays)
Functions: 5
SameDiff Function Defs: 0
Loss function variables: [mse]

— Variables —

  • Name - - Array Shape - - Variable Type - - Data Type- - Output Of Function - - Inputs To Functions -
    add - ARRAY FLOAT add(add) [subtract]
    bias VARIABLE FLOAT [add]
    input [32, 6, -1] PLACEHOLDER FLOAT [matmul]
    label [32, 2, -1] PLACEHOLDER FLOAT [subtract]
    matmul - ARRAY FLOAT matmul(matmul) [add]
    mse - ARRAY FLOAT reduce_mean(reduce_mean)
    square - ARRAY FLOAT square(square) [reduce_mean]
    subtract - ARRAY FLOAT subtract(subtract) [square]
    weights [32, 6, 2] VARIABLE FLOAT [matmul]

— Functions —
- Function Name - - Op - - Inputs - - Outputs -
0 matmul Mmul [input, weights] [matmul]
1 add AddOp [matmul, bias] [add]
2 subtract SubOp [label, add] [subtract]
3 square Square [subtract] [square]
4 reduce_mean Mean [square] [mse]

ShapeUtils::evalShapeForMatmul static method: input shapes are inconsistent: xDim 28 != yDim 6
Removing variable <1:0>
Removing variable <1:1>
Exception in thread “main” java.lang.RuntimeException: Op matmul with name matmul failed to execute. Here is the error from c++:
at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.calculateOutputShape(NativeOpExecutioner.java:1672)
at org.nd4j.linalg.api.ops.DynamicCustomOp.calculateOutputShape(DynamicCustomOp.java:696)
at org.nd4j.autodiff.samediff.internal.InferenceSession.getAndParameterizeOp(InferenceSession.java:1363)
at org.nd4j.autodiff.samediff.internal.InferenceSession.getAndParameterizeOp(InferenceSession.java:68)
at org.nd4j.autodiff.samediff.internal.AbstractSession.output(AbstractSession.java:531)
at org.nd4j.autodiff.samediff.SameDiff.directExecHelper(SameDiff.java:2927)
at org.nd4j.autodiff.samediff.SameDiff.directExecHelper(SameDiff.java:2890)
at org.nd4j.autodiff.samediff.SameDiff.outputHelper(SameDiff.java:2683)
at org.nd4j.autodiff.samediff.SameDiff.output(SameDiff.java:2518)
at org.nd4j.autodiff.samediff.config.OutputConfig.exec(OutputConfig.java:132)
at org.nd4j.autodiff.samediff.SameDiff.output(SameDiff.java:2473)
at org.deeplearning4j.examples.quickstart.modeling.recurrent.LocationNextNeuralNetworkV6.sameDiff2(LocationNextNeuralNetworkV6.java:953)
at org.deeplearning4j.examples.quickstart.modeling.recurrent.LocationNextNeuralNetworkV6.main(LocationNextNeuralNetworkV6.java:197)

Did I send you the information you were looking for, or did I
misunderstand what you asked for?

This may be helpful. I expanded the output to print an input features dataset entry and the weights array it is to be multipied by.
Below you will see the output. the only visible difference between the two is an empty row in some of the features dataset entries. Could this be related to the problem?
As you can see, I am struggling…
Printing traindata dataset shape features and labels
[32, 6, 28]
[32, 2, 28]
======================================================= -
features.toString() - [[[ 0.2400, 0, 0, … 0, 0, 0],
[ -0.2433, 0, 0, … 0, 0, 0],
[ -1.3521, 0, 0, … 0, 0, 0],
[ -0.3702, 0, 0, … 0, 0, 0],
[ -0.5299, 0, 0, … 0, 0, 0],
[ 0.2094, 0, 0, … 0, 0, 0]],

[[ 0.4001, 0.4001, 0.6401, … 0, 0, 0],
[ -0.4937, -0.4937, -0.2790, … 0, 0, 0],
[ -1.3508, -1.3508, -1.3508, … 0, 0, 0],
[ -0.8785, -0.8785, -0.4164, … 0, 0, 0],
[ -0.5299, -0.5299, -0.5299, … 0, 0, 0],
[ 0.2094, 0.2094, 0.2094, … 0, 0, 0]],

[[ 0.6001, 0, 0, … 0, 0, 0],
[ 0, 0, 0, … 0, 0, 0],
[ -1.3462, 0, 0, … 0, 0, 0],
[ -0.4164, 0, 0, … 0, 0, 0],
[ -0.5299, 0, 0, … 0, 0, 0],
[ 0.2094, 0, 0, … 0, 0, 0]],

…,

[[ 0.4401, 0, 0, … 0, 0, 0],
[ -0.4436, 0, 0, … 0, 0, 0],
[ -1.3012, 0, 0, … 0, 0, 0],
[ -0.7861, 0, 0, … 0, 0, 0],
[ -0.5299, 0, 0, … 0, 0, 0],
],

[[ -0.7201, 0, 0, … 0, 0, 0],
[ 0.4793, 0, 0, … 0, 0, 0],
[ -1.3026, 0, 0, … 0, 0, 0],
[ 0.0918, 0, 0, … 0, 0, 0],
[ -0.5299, 0, 0, … 0, 0, 0],
],

[[ 0.3200, 0, 0, … 0, 0, 0],
[ -0.3935, 0, 0, … 0, 0, 0],
[ -1.3012, 0, 0, … 0, 0, 0],
[ 1.8014, 0, 0, … 0, 0, 0],
[ -0.5299, 0, 0, … 0, 0, 0],
]]
======================================================= -
labels.toString() - [[[ -0.1280, 0, 0, … 0, 0, 0],
[ -0.0586, 0, 0, … 0, 0, 0]],

[[ 0, 0, 0, … 0, 0, 0],
[ 0, 0, 0, … 0, 0, 0]],

[[ 0.2561, 0, 0, … 0, 0, 0],
[ 0.1406, 0, 0, … 0, 0, 0]],

…,

[[ 0.0854, 0, 0, … 0, 0, 0],
],

[[ -1.1523, 0, 0, … 0, 0, 0],
],

[[ -0.0427, 0, 0, … 0, 0, 0],
]]
======================================================= -
weights.getShapeDescriptor().toString() - 1 - [3,32, 6, 28,168, 28, 1,8192,1,c]
======================================================= -
[[[ -0.8787, -0.2730, 0.3692, … -0.7391, -0.5481, -0.0243],
[ -0.1612, -0.1481, -0.7068, … 0.1574, 0.7205, -0.9144],
[ 0.6026, 0.2746, 0.3720, … 0.5826, 0.4048, -0.4641],
[ -1.1143, 0.0120, 0.2247, … 0.7621, -0.0117, -0.1253],
[ 0.1147, -0.8161, 0.0143, … -0.3111, 0.6532, -0.3620],
[ 0.0226, 0.5594, -0.8684, … -0.2127, -0.0699, 0.3933]],

[[ -0.1906, -0.7245, 0.7628, … 0.3719, 0.4100, -0.2130],
[ 0.2397, -0.2575, -0.7915, … -0.1151, -0.6772, -0.6307],
[ -0.5629, -0.3701, -0.7255, … -0.0369, -0.0495, 0.1911],
[ 0.7826, 0.0183, 0.9971, … 0.4647, 0.1239, -0.4908],
[ 0.0469, -0.5158, 1.3268, … -0.0152, -0.2591, -0.8504],
[ 0.1247, -0.0932, -0.0954, … 0.1856, -0.3050, -0.5979]],

[[ 0.5216, 0.5035, 0.3466, … -0.3440, 0.2419, 0.1135],
[ 0.1326, -0.7032, 0.5633, … 0.0081, 0.1710, -0.1045],
[ -0.0060, 0.5615, 0.8625, … -0.8102, -0.1918, -0.2438],
[ 0.4377, -0.4910, -0.2934, … -0.2577, -0.1496, 1.0962],
[ 0.3494, 0.3120, 0.4968, … -0.1559, -0.4656, 0.2439],
[ -0.2563, -0.6690, -0.4395, … -0.6395, -0.1103, -1.0487]],

…,

[[ 0.0911, -0.4099, -0.2312, … -0.0353, 0.5298, -0.2573],
[ -0.4229, 0.6112, 0.0431, … 0.2102, -0.0435, 0.2963],
[ 0.2812, -0.1381, -0.1856, … -0.7837, -0.9451, -0.1927],
[ 0.8664, -0.3155, 0.2685, … 0.4391, -0.9770, -0.1411],
[ -0.6029, -0.2591, 0.3935, … -0.2524, -0.5626, 0.9343],
[ -0.6618, -0.4310, 0.2071, … -0.1911, 0.7627, 0.1040]],

[[ -0.2163, 0.3473, -0.0315, … 0.3783, 0.2826, -0.0145],
[ -0.3552, 0.0900, -1.0776, … -0.6926, -0.0005, -0.7441],
[ -0.7482, -1.3268, -0.8074, … 0.5400, 0.6061, -0.5536],
[ -0.1322, 0.2446, 0.3136, … -0.1605, -0.1169, 0.3125],
[ -0.0479, 1.1222, 0.8091, … -0.3317, 0.0927, -0.0997],
[ -0.0838, 0.0034, 0.5361, … -0.2754, 0.5494, -1.3045]],

[[ -0.3480, -0.2745, -0.4577, … -0.4209, 0.2082, 0.0258],
[ -0.0948, -0.0850, -0.1651, … 0.8483, -0.4839, -0.4177],
[ 0.9006, -0.7308, -0.3485, … 0.7823, -0.0449, -0.1500],
[ 0.4681, 0.6678, -1.3167, … -0.8063, -0.6321, -0.2005],
[ 0.1025, 0.4328, -0.0631, … -0.0129, -0.0473, 0.0716],
[ 0.5558, -0.1982, -0.2603, … -0.0546, -0.6212, -0.3587]]]
======================================================= -

@adonnini it doesn’t look complete. The first op is matmul yet your log says add.

The add operation reported in the log I sent you refers to the operations during the normalization of the input dataset which takes place before the samediff operations.

I must be doing something wrong. When I place

    Nd4j.getExecutioner().enableVerboseMode(true);
    Nd4j.getExecutioner().enableDebugMode(true);

after dataset normalization processing, just before sameDiff.create(), no verbose/debugging information is reported. The log file is like this:

Printing traindata dataset shape - 1
[32, 6, 57]
Printing testdata dataset shape - 1
[32, 6, 13]

Printing traindata feature and label dataset shape
[32, 6, 28]
[32, 2, 28]
======================================================= -
weights.getShapeDescriptor().toString() - 1 - [3,32, 6, 28,168, 28, 1,8192,1,c]
Printing sd information - 1
SameDiff(nVars=9,nOps=5)
— Summary —
Variables: 9 (2 with arrays)
Functions: 5
SameDiff Function Defs: 0
Loss function variables:

— Variables —

  • Name - - Array Shape - - Variable Type - - Data Type- - Output Of Function - - Inputs To Functions -
    add - ARRAY FLOAT add(add) [subtract]
    bias VARIABLE FLOAT [add]
    input [32, 6, -1] PLACEHOLDER FLOAT [matmul]
    label [32, 2, -1] PLACEHOLDER FLOAT [subtract]
    matmul - ARRAY FLOAT matmul(matmul) [add]
    mse - ARRAY FLOAT reduce_mean(reduce_mean)
    square - ARRAY FLOAT square(square) [reduce_mean]
    subtract - ARRAY FLOAT subtract(subtract) [square]
    weights [32, 6, 28] VARIABLE FLOAT [matmul]

— Functions —
- Function Name - - Op - - Inputs - - Outputs -
0 matmul Mmul [input, weights] [matmul]
1 add AddOp [matmul, bias] [add]
2 subtract SubOp [label, add] [subtract]
3 square Square [subtract] [square]
4 reduce_mean Mean [square] [mse]

ShapeUtils::evalShapeForMatmul static method: input shapes are inconsistent: xDim 28 != yDim 6
Removing variable <1:0>
Removing variable <1:1>
Exception in thread “main” java.lang.RuntimeException: Op matmul with name matmul failed to execute. Here is the error from c++:
at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.calculateOutputShape(NativeOpExecutioner.java:1672)
at org.nd4j.linalg.api.ops.DynamicCustomOp.calculateOutputShape(DynamicCustomOp.java:696)
at org.nd4j.autodiff.samediff.internal.InferenceSession.getAndParameterizeOp(InferenceSession.java:1363)
at org.nd4j.autodiff.samediff.internal.InferenceSession.getAndParameterizeOp(InferenceSession.java:68)
at org.nd4j.autodiff.samediff.internal.AbstractSession.output(AbstractSession.java:531)
at org.nd4j.autodiff.samediff.SameDiff.directExecHelper(SameDiff.java:2927)
at org.nd4j.autodiff.samediff.SameDiff.batchOutputHelper(SameDiff.java:2870)
at org.nd4j.autodiff.samediff.SameDiff.output(SameDiff.java:2835)
at org.nd4j.autodiff.samediff.SameDiff.output(SameDiff.java:2808)
at org.nd4j.autodiff.samediff.config.BatchOutputConfig.output(BatchOutputConfig.java:183)
at org.nd4j.autodiff.samediff.SameDiff.output(SameDiff.java:2764)
at org.deeplearning4j.examples.quickstart.modeling.recurrent.LocationNextNeuralNetworkV6.sameDiff2(LocationNextNeuralNetworkV6.java:1068)
at org.deeplearning4j.examples.quickstart.modeling.recurrent.LocationNextNeuralNetworkV6.main(LocationNextNeuralNetworkV6.java:198)

@adonnini from the looks of it either your input data is off or your variable declaration is. That is up to you to decide. I will just tell you how to read the error and you decide how you want to set your problem p.

Breaking this down a bit and please understand that I’m very much asking you to fix this part on your own since your problem is very much domain specific and may have inherent constraints.

-1 is usually mainly used for batch sizes. It won’t work for time series when matmul is your first op. Now I’m going to highlight something here: please understand the underlying mechanics and do not ignore this part.

matmul works as I explained above and the error is telling you that. The last value (columns) of your first value (x in this case or your input data) must match the second to last value (usually rows) of your weights.

You need to decide how to make that match your problem. How you want to do that is up to you. Your weights will always be a fixed size. Your inputs here should be as well.

Thanks very much! This is really helpful. I think I may have resolved the matmul issue.

As expected, a problem cropped up with the next operation

SDVariable difference = label.sub(out);
label is the placeHolder variables for label files
out is the matmul result

Another shape mismatch between the label placeHolder and the SDVariable out. I am working to fix it.

I don’t quite understand how shapes defined for placeholders are relevant. It looks like every time actual data is stored in a placehoder its shape overrides the shape defined in the placeholder definition. Is this right?

When configuring a network in dl4j, matching nIn and nOut values for all the layers defined in the configuration is straightforward and the relationships are fairly clear.

I still have not been able to figure out the equivalent input/output relationships for SDVariable variables and how to manage them.

For the weights, using the rule you told me about, I iterate through the input data datsets and make sure that for each dataset the weight variable has the right shape using the values of the shape of the input dataset.

Quick question, subtract is a simple matrix subtraction, right?

Please let me know if there are any docs which can shed some light on rules for matrix operations and sd variables in general like the one you told me for matmul

Thanks

@adonnini placeholders are how you describe the variables that will be input in to the graph. Those have shapes.

Regarding ins and outs it depends on the op. Some ops are full blown layers. Some are just ops. sd.nn.linear(…) provides in and out. So does sd.nn().lstmLayer(…).

Yes subtraction is element wise but also allows broadcasting (eg: matrix - 1 and similar calculations)

I understand the role of placeholders, thanks. And, mostly I understand how samediff works and is organized. I have read the documentation a number of times.

What I still do not get and is preventing me from making any meaningful progress is the in/out relationships from one op to another, whether it’s a matrix operation or a full blown layer.

Probably, my limited understanding of neural nets shows when I say that when using dl4j I never had a problem figuring our in/out relationships and infer the rules chaining layers.

With samediff, I haven’t got a clue.

I fixed the matmul problem thanks to your explanation of the in/out rule for combining an input placeholder with weights via matmul. Mostly, it was a mechanical solution, not necessarily understanding on my part why it worked.

The same applies to my latest problem making sub work, which I haven’t yet.

The shape of the result of the matmul operation is defined by the shape of the input placeholder.

When you take this result and try to calculate the difference between it and labels it can’t work because label placeholder shape by definition will always be different from input placeholder shape. And, I don’t think my dataset is unique with regards to this.

So, I am stuck. And, frankly, I am beginning to think that samediff implementation is beyond my capabilities. It’s unfortunate that, if I understood you correctly, ultimately you will transition from dl4j completely to samediff.

Thanks for all your help

@adonnini there’s no relationships to be aware of. It’s all op dependent. You have to manage what the expected inputs are for a given layer and match that up to your variables. If you can narrow it down what you want to do (eg: focus on lstmLayer or a linear layer) then give me some code to work with and I’ll help you with that specific circumstance.

Using the linear layer as an example you pass in some weights. Those weights due to the matrix multiply op expect the rule we described above.

In general the output shapes will depend on the input. Your variables you specify are fixed and only batch sizes are dynamic.

Regarding the labels, I have to disagree that’s not valid. The labels will need to match whatever your final output is. Those are present in the same minibatch and by definition line up if you’re building a model.

I understand. In my case input has 6 columns (features) and labels has 2 columns (the inferred lat, lon).

This means that the initial nIn is 6 and the final nOut has to be 2.

When using dl4j it has always worked regardless of how many layers in between the initial layer and the output layer.In the layers in between the first and the last, nIn and nOut changed but, in the end at the output layer nIn may have been something different from 6 but nOut was always 2.

In the samediff test I have been working on, the input placeholder has shape [minibatch,nIn,#rows in feature file]

As per your instructions, the second-to-last dimension of weights has the same value as the last dimension of input.
The value I give the other dimensions of weights does not affect the outcome.

After execution of the matmul operation, the output has shape [minibatch, nIn, dim2 of weights]

label has shape [minibatchsize, nOut, #rows in corresponding feature file] (as it should and has always had when training/testing the network using dl4j).

So, clearly, there is a mistmach of shape between matmul output and label and trying to execute difference fails. Specifically, dim1 in the matmul output is always nIn, and in label dim1 is nOut (as it should be).

Even if mechanically I could force the shapes of matmul output and label to be the same, it would be meaningless as it would compromise the data contained in both arrays.

Just to be clear the shape of the inputs of difference are:
out - [minibatch,nIn,#rows in input]
label - [minibatch,nOut,#rows in input]

What am I missing? Where am I going wrong?

would it be worth increasing dim1 of label to match dim1 of out? Is there an operation that can do that (I know that I could write code to do it manually)?

@adonnini I don’t think it’s remotely equivalent though. You’re not even using the LSTM ops. Do you mind doing a direct port of your network to samediff using the lstmLayer op?

What you’re sitting here saying to me is 2 networks that aren’t equivalent work the same way but one isn’t working. That’s actually not the case. The normal matmul is not and has never been dynamic. The LSTM layer does a lot more than just matmul to make it work.

Regarding your setup apologies I haven’t looked at it in depth. I’d have to see if your final matrix multiply result is even correct or not or if your labels are correct. If something isn’t lining up (eg: they’re not the same shape) then it’s not about dl4j vs samediff. It’s about whether the network is defined correctly and I’m almost sure it’s not.

I followed your suggestion to replicate the original one-layer LSTM network I implemented using dl4j.

This

was the starting point for my code.

It seems to work up to a point. I have no idea as to how meaningful the results are as I have been unable to add a loss function dues to a shape mismatch.

After running for a number of cycles (2? I am not sure how to read the log file), it fails with a shape mismatch error.

Below you will find the error log and the latest version of the code (the error log if far longer).

I got to this point by trial and error without really understanding why the code works to the point that it does.

Please let me know what you think is going on. Thanks.

ERROR

Printing traindata dataset shape - 1
[32, 6, 57]
Printing testdata dataset shape - 1
[32, 6, 13]
Printing traindata feature and label dataset shape
[32, 6, 28]
[32, 2, 28]
features - dim0 - 32
features - dim1 - 6
features - dim2 - 28
Printing sd information
SameDiff(nVars=12,nOps=5)
— Summary —
Variables: 12 (5 with arrays)
Functions: 5
SameDiff Function Defs: 0
Loss function variables:

— Variables —

  • Name - - Array Shape - - Variable Type - - Data Type- - Output Of Function - - Inputs To Functions -
    add - ARRAY FLOAT add(add) [softmax]
    b1 [1] VARIABLE FLOAT [add]
    bias [2, 8] VARIABLE FLOAT [lstmLayer]
    input [32, 6, 28] PLACEHOLDER FLOAT [lstmLayer]
    label [32, 4] PLACEHOLDER FLOAT
    lstmLayer - ARRAY FLOAT lstmLayer(lstmLayer) [reduce_mean]
    matmul - ARRAY FLOAT matmul(matmul) [add]
    out - ARRAY FLOAT softmax(softmax)
    rWeights [2, 2, 8] VARIABLE FLOAT [lstmLayer]
    reduce_mean - ARRAY FLOAT reduce_mean(reduce_mean) [matmul]
    w1 [4, 4] VARIABLE FLOAT [matmul]
    weights [2, 28, 8] VARIABLE FLOAT [lstmLayer]

— Functions —
- Function Name - - Op - - Inputs - - Outputs -
0 lstmLayer LSTMLayer [input, weights, rWeights, bias] [lstmLayer]
1 reduce_mean Mean [lstmLayer] [reduce_mean]
2 matmul Mmul [reduce_mean, w1] [matmul]
3 add AddOp [matmul, b1] [add]
4 softmax SoftMax [add] [out]

Added differentiated op softmax
Added differentiated op add
Added differentiated op matmul
Added differentiated op reduce_mean
Added differentiated op lstmLayer
Debug info for node_2 input[0]; shape: [32, 6, 28]; ews: [1]; order: [f]; dtype: [FLOAT]; first values: [0.480073, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0]
Debug info for node_2 input[1]; shape: [2, 28, 8]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [0.708899, 0.482904, 0.531003, 0.976715, 0.442821, 0.552599, 0.749225, 0.492696, 0.407787, 0.117383, 0.426929, 0.652086, 0.71485, 0.606519, 0.371322, 0.694302]
Debug info for node_2 input[2]; shape: [2, 2, 8]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [0.478817, 0.713919, 0.795261, 0.31311, 0.652732, 0.998342, 0.847164, 0.0842575, 0.200632, 0.335908, 0.0698259, 0.963265, 0.213671, 0.445501, 0.940424, 0.396508]
Removing variable <1:0>
Removing variable <1:1>
Removing variable <1:2>
Removing variable <1:3>
Executing op: [lstmLayer]
About to get variable in execute output
node_1:0 result shape: [32, 6, 4]; dtype: FLOAT; first values [0.849772, 0.73437, 5.83051, 8.03961, 1.46902, 1.28924, 2.67941, 3.98201, 1.91017, 1.73674, 1.42729, 2.21462, 3.25434, 3.84437, 1.34903, 1.72554, 5.95258, 8.76993, 0.933508, 1.02732, 11.9936, 19.9579, 0.680528, 0.64996, 0.868451, 0.76536, 1.02353, 1.11707, 0.993514, 0.691358, 0.39941, 0.334278]
Removing variable <1:0>
Removing variable <1:1>
Executing op: [matmul]
About to get variable in execute output
node_1:0 result shape: [32, 4]; dtype: FLOAT; first values [0.794465, 0.127477, -1.60831, -3.91174, 0.299192, 0.00638225, -0.0186475, -0.42616, 0.794887, 0.127653, -1.60946, -3.91384, 0.58846, 0.00785959, -0.438789, -1.62087, 1.11276, -0.0541707, -1.53608, -4.06199, 1.65794, -0.557415, -0.827336, -3.39457, 1.11354, -0.0576317, -1.52672, -4.04792, 1.55479, -0.499344, -0.85779, -3.3646]
Removing variable <1:0>
Removing variable <1:1>
Executing op: [add]
About to get variable in execute output
node_1:0 result shape: [32, 4]; dtype: FLOAT; first values [1.21663, 0.549647, -1.18614, -3.48957, 0.721362, 0.428552, 0.403522, -0.00399056, 1.21706, 0.549823, -1.18729, -3.49167, 1.01063, 0.43003, -0.0166191, -1.1987, 1.53493, 0.367999, -1.11391, -3.63982, 2.08011, -0.135245, -0.405167, -2.9724, 1.53571, 0.364538, -1.10456, -3.62575, 1.97696, -0.0771744, -0.43562, -2.94243]
Removing variable <1:0>
Removing variable <1:1>
Executing op: [softmax_bp]
About to get variable in execute output
node_1:0 result shape: [32, 4]; dtype: FLOAT; first values [0, 0, 0, 0, 0, 0, 0, 0, 3.69649e-08, 1.89676e-08, 3.33882e-09, 3.33284e-10, 0, 0, 0, 0, 4.29518e-08, 1.33717e-08, 3.03812e-09, 2.43005e-10, 4.97198e-08, 5.42515e-09, 4.14177e-09, 3.17872e-10, 0, 0, 0, 0, 0, 0, 0, 0]
Removing variable <1:0>
Removing variable <1:1>
Removing variable <1:2>
Executing op: [add_bp]
About to get variable in execute output
node_1:0 result shape: [32, 4]; dtype: FLOAT; first values [0, 0, 0, 0, 0, 0, 0, 0, 3.69649e-08, 1.89676e-08, 3.33882e-09, 3.33284e-10, 0, 0, 0, 0, 4.29518e-08, 1.33717e-08, 3.03812e-09, 2.43005e-10, 4.97198e-08, 5.42515e-09, 4.14177e-09, 3.17872e-10, 0, 0, 0, 0, 0, 0, 0, 0]
About to get variable in execute output
node_1:1 result shape: [1]; dtype: FLOAT; first values [1.78814e-07]
Executing op: [adam_updater]
About to get variable in execute output
node_1:0 result shape: [1]; dtype: FLOAT; first values [0.000992596]
About to get variable in execute output
node_1:1 result shape: [1]; dtype: FLOAT; first values [1.79738e-12]
About to get variable in execute output
node_1:2 result shape: [1]; dtype: FLOAT; first values [4.23958e-06]
Executing op: [subtract]
About to get variable in execute output
node_1:0 result shape: [1]; dtype: FLOAT; first values [0.421177]
Removing variable <1:0>
Removing variable <1:1>
Removing variable <1:2>
Executing op: [matmul_bp]
Executing op: [matmul]
About to get variable in execute output
node_1:0 result shape: [32, 4]; dtype: FLOAT; first values [0, 0, 0, 0, 0, 0, 0, 0, -7.22628e-09, 1.56904e-09, 3.0658e-08, -6.69388e-09, 0, 0, 0, 0, -1.12187e-08, 2.0442e-09, 3.38978e-08, -2.62647e-09, -1.62658e-08, 1.75764e-09, 3.76462e-08, 3.27964e-09, 0, 0, 0, 0, 0, 0, 0, 0]
Executing op: [matmul]
About to get variable in execute output
node_1:0 result shape: [4, 4]; dtype: FLOAT; first values [6.91882e-07, -4.85486e-07, 3.21119e-08, -2.81657e-09, 1.20087e-06, -2.27313e-07, 8.13507e-08, 3.53942e-09, 4.76888e-07, -1.98581e-07, 7.33345e-09, -1.04926e-08, 7.29635e-07, -1.09877e-07, 3.11691e-08, -7.05006e-09]
About to get variable in execute output
node_1:0 result shape: [32, 4]; dtype: FLOAT; first values [0, 0, 0, 0, 0, 0, 0, 0, -7.22628e-09, 1.56904e-09, 3.0658e-08, -6.69388e-09, 0, 0, 0, 0, -1.12187e-08, 2.0442e-09, 3.38978e-08, -2.62647e-09, -1.62658e-08, 1.75764e-09, 3.76462e-08, 3.27964e-09, 0, 0, 0, 0, 0, 0, 0, 0]
About to get variable in execute output
node_1:1 result shape: [4, 4]; dtype: FLOAT; first values [6.91882e-07, -4.85486e-07, 3.21119e-08, -2.81657e-09, 1.20087e-06, -2.27313e-07, 8.13507e-08, 3.53942e-09, 4.76888e-07, -1.98581e-07, 7.33345e-09, -1.04926e-08, 7.29635e-07, -1.09877e-07, 3.11691e-08, -7.05006e-09]
Executing op: [adam_updater]
About to get variable in execute output
node_1:0 result shape: [4, 4]; dtype: FLOAT; first values [-0.00099122, 0.000990289, 0.000923081, -0.000990391, 0.00096508, 0.000893534, -0.000993512, -0.000992771, 0.000995655, 0.000981697, 0.000982201, 0.000995039, 0.000974414, -0.00099488, 0.000988444, -0.000992701]
About to get variable in execute output
node_1:1 result shape: [4, 4]; dtype: FLOAT; first values [1.27447e-12, 1.03996e-12, 1.44018e-14, 1.06229e-12, 7.63809e-14, 7.04372e-15, 2.34463e-12, 1.88619e-12, 5.25077e-12, 2.87671e-13, 3.04502e-13, 4.02234e-12, 1.45039e-13, 3.77599e-12, 7.31585e-13, 1.84979e-12]
About to get variable in execute output
node_1:2 result shape: [4, 4]; dtype: FLOAT; first values [-3.56999e-06, 3.22486e-06, 3.795e-07, -3.2593e-06, 8.73968e-07, 2.65402e-07, -4.84218e-06, -4.34305e-06, 7.24627e-06, 1.6961e-06, 1.74501e-06, 6.34224e-06, 1.20433e-06, -6.14495e-06, 2.7048e-06, -4.30095e-06]
Executing op: [subtract]
About to get variable in execute output
node_1:0 result shape: [4, 4]; dtype: FLOAT; first values [-0.362927, 0.326351, 0.0367058, -0.324911, 0.0744229, 0.0279198, -0.484037, -0.433348, 0.718862, 0.170614, 0.173445, 0.633333, 0.112162, -0.612401, 0.26918, -0.429031]
Removing variable <1:0>
Removing variable <1:1>
Executing op: [reduce_mean_bp]
About to get variable in execute output
node_1:0 result shape: [32, 6, 4]; dtype: FLOAT; first values [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Debug info for node_2 input[0]; shape: [32, 6, 28]; ews: [1]; order: [f]; dtype: [FLOAT]; first values: [0.480073, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0]
Debug info for node_2 input[1]; shape: [2, 28, 8]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [0.708899, 0.482904, 0.531003, 0.976715, 0.442821, 0.552599, 0.749225, 0.492696, 0.407787, 0.117383, 0.426929, 0.652086, 0.71485, 0.606519, 0.371322, 0.694302]
Debug info for node_2 input[2]; shape: [2, 2, 8]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [0.478817, 0.713919, 0.795261, 0.31311, 0.652732, 0.998342, 0.847164, 0.0842575, 0.200632, 0.335908, 0.0698259, 0.963265, 0.213671, 0.445501, 0.940424, 0.396508]
Debug info for node_2 input[3]; shape: [2, 8]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [0.634018, 0.182539, 0.817293, 0.734501, 0.221919, 0.359397, 0.909973, 0.57734, 0.274089, 0.148167, 0.000462413, 0.952388, 0.793436, 0.771765, 0.292129, 0.309894]
Removing variable <1:0>
Removing variable <1:1>
Removing variable <1:2>
Removing variable <1:3>
Removing variable <1:4>
Executing op: [lstmLayer_bp]
About to get variable in execute output
node_1:0 result shape: [32, 6, 28]; dtype: FLOAT; first values [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
About to get variable in execute output
node_1:1 result shape: [2, 28, 8]; dtype: FLOAT; first values [4.13802e-08, 4.89966e-09, 3.10218e-08, 4.43322e-09, 7.48696e-08, 1.10441e-08, 1.41268e-08, 1.25986e-08, 3.90659e-09, -2.75095e-10, 5.84314e-10, 2.44075e-11, -9.23789e-08, -2.17932e-08, 9.25006e-11, 3.13561e-10, 4.06404e-09, -2.42605e-10, 5.73112e-10, 2.1081e-11, -9.24447e-08, -2.11804e-08, 4.27572e-12, 3.33022e-10, 4.03604e-09, -2.6178e-10, 5.71726e-10, 2.10758e-11, 6.95787e-09, 1.86189e-09, -4.75046e-11, 3.26383e-10]
About to get variable in execute output
node_1:2 result shape: [2, 2, 8]; dtype: FLOAT; first values [-1.93821e-07, -1.25492e-08, -7.40319e-08, -6.34455e-09, -4.13854e-07, -2.18169e-08, -2.77321e-08, -3.26106e-08, -1.52975e-07, -1.05278e-08, -6.53342e-08, -5.4067e-09, -3.84343e-07, -3.68898e-09, -2.43036e-08, -2.48422e-08, 2.93766e-07, 7.21267e-08, 3.25517e-08, 7.91499e-08, 4.2865e-07, 2.01337e-07, 1.5855e-07, 3.08569e-07, 3.59455e-07, 7.63541e-08, 6.85051e-08, 9.02766e-08, 5.35456e-07, 2.2367e-07, 1.87411e-07, 3.63651e-07]
About to get variable in execute output
node_1:3 result shape: [2, 8]; dtype: FLOAT; first values [-1.12297e-07, -2.21489e-08, -4.55664e-08, -5.49458e-09, -2.71389e-07, -3.62453e-08, -4.93015e-08, -3.61429e-08, 2.75381e-07, 1.40554e-07, 1.31891e-07, 6.49031e-08, 3.66783e-07, 2.81401e-07, 2.12401e-07, 2.94411e-07]
Executing op: [adam_updater]
About to get variable in execute output
node_1:0 result shape: [2, 28, 8]; dtype: FLOAT; first values [0.000995561, 0.000993495, 0.000994083, 0.000996773, 0.000992921, 0.000994311, 0.000995798, 0.000993624, 0.000992306, 0.000973766, 0.000992647, 0.000995174, 0.00099559, 0.000994811, 0.000991556, 0.000995466, 0.000996443, 0.000996296, 0.000983858, 0.000941161, 0.00098094, 0.000986143, 0.000996359, 0.000994685, 0.000995813, 0.000994135, 0.000979586, 0.000934845, 0.000994655, 0.000996165, 0.000996138, 0.000994928]
About to get variable in execute output
node_1:1 result shape: [2, 28, 8]; dtype: FLOAT; first values [5.03119e-12, 2.3324e-12, 2.8229e-12, 9.54046e-12, 1.96752e-12, 3.05483e-12, 5.61542e-12, 2.4287e-12, 1.6632e-12, 1.37779e-13, 1.82271e-12, 4.2521e-12, 5.09685e-12, 3.67596e-12, 1.37879e-12, 4.82054e-12, 7.84838e-12, 7.23368e-12, 3.71478e-13, 2.55853e-14, 2.64868e-13, 5.06446e-13, 7.48859e-12, 3.50235e-12, 5.65498e-12, 2.87279e-12, 2.30261e-13, 2.05869e-14, 3.46353e-12, 6.74598e-12, 6.65147e-12, 3.8485e-12]
About to get variable in execute output
node_1:2 result shape: [2, 28, 8]; dtype: FLOAT; first values [7.09313e-06, 4.82953e-06, 5.31314e-06, 9.76759e-06, 4.4357e-06, 5.52709e-06, 7.49366e-06, 4.92822e-06, 4.07826e-06, 1.1738e-06, 4.26935e-06, 6.52086e-06, 7.13927e-06, 6.06301e-06, 3.71323e-06, 6.94306e-06, 8.85917e-06, 8.50516e-06, 1.92739e-06, 5.05823e-07, 1.62749e-06, 2.25045e-06, 8.65372e-06, 5.9181e-06, 7.52001e-06, 5.35988e-06, 1.51745e-06, 4.53731e-07, 5.88521e-06, 8.21345e-06, 8.15571e-06, 6.20367e-06]
Executing op: [subtract]
About to get variable in execute output
node_1:0 result shape: [2, 28, 8]; dtype: FLOAT; first values [0.707904, 0.48191, 0.530009, 0.975718, 0.441828, 0.551604, 0.748229, 0.491702, 0.406794, 0.116409, 0.425937, 0.65109, 0.713855, 0.605524, 0.37033, 0.693307, 0.884879, 0.849522, 0.191749, 0.0496409, 0.162692, 0.224271, 0.864375, 0.590812, 0.750964, 0.534996, 0.150759, 0.044438, 0.587457, 0.82033, 0.814575, 0.619368]
Executing op: [adam_updater]
About to get variable in execute output
node_1:0 result shape: [2, 2, 8]; dtype: FLOAT; first values [0.000993412, 0.000995589, 0.000996036, 0.000989999, 0.000995148, 0.000996842, 0.00099628, 0.000963691, 0.000984366, 0.000990671, 0.000956283, 0.000996728, 0.000985153, 0.000992951, 0.000996648, 0.000992083, 0.000983321, 0.000996377, 0.000990209, 0.000994209, 0.000955753, 0.000995983, 0.000996727, 0.000909622, 0.000986283, 0.000995567, 0.000994379, 0.000947502, 0.000994527, 0.000993998, 0.000974479, 0.00097095]
About to get variable in execute output
node_1:1 result shape: [2, 2, 8]; dtype: FLOAT; first values [2.2741e-12, 5.09495e-12, 6.31254e-12, 9.79972e-13, 4.20668e-12, 9.96238e-12, 7.17207e-12, 7.04438e-14, 3.96412e-13, 1.12762e-12, 4.78478e-14, 9.27763e-12, 4.40271e-13, 1.98436e-12, 8.83929e-12, 1.5702e-12, 3.4757e-13, 7.56255e-12, 1.02276e-12, 2.9478e-12, 4.66584e-14, 6.14613e-12, 9.27295e-12, 1.01296e-14, 5.1699e-13, 5.04343e-12, 3.12952e-12, 3.25743e-14, 3.30135e-12, 2.74225e-12, 1.458e-13, 1.11713e-13]
About to get variable in execute output
node_1:2 result shape: [2, 2, 8]; dtype: FLOAT; first values [4.76878e-06, 7.13794e-06, 7.9452e-06, 3.13047e-06, 6.48594e-06, 9.98124e-06, 8.46887e-06, 8.39314e-07, 1.99102e-06, 3.35803e-06, 6.91726e-07, 9.63211e-06, 2.09828e-06, 4.45464e-06, 9.40181e-06, 3.9626e-06, 1.86434e-06, 8.69635e-06, 3.19808e-06, 5.4294e-06, 6.83074e-07, 7.83978e-06, 9.62968e-06, 3.18273e-07, 2.27376e-06, 7.10176e-06, 5.59425e-06, 5.70743e-07, 5.74577e-06, 5.23669e-06, 1.20749e-06, 1.05695e-06]
Executing op: [subtract]
About to get variable in execute output
node_1:0 result shape: [2, 2, 8]; dtype: FLOAT; first values [0.477823, 0.712924, 0.794264, 0.31212, 0.651737, 0.997345, 0.846167, 0.0832938, 0.199648, 0.334917, 0.0688696, 0.962268, 0.212686, 0.444508, 0.939427, 0.395516, 0.182513, 0.867917, 0.318493, 0.541154, 0.0630651, 0.780968, 0.960386, 0.027832, 0.222795, 0.708416, 0.557745, 0.055224, 0.568228, 0.520438, 0.1179, 0.101088]
Executing op: [adam_updater]
About to get variable in execute output
node_1:0 result shape: [2, 8]; dtype: FLOAT; first values [0.000995028, 0.000982951, 0.000996144, 0.000995713, 0.000985779, 0.000991269, 0.000996535, 0.000994549, 0.000988706, 0.000979296, 0.000360328, 0.000996693, 0.000996049, 0.000995934, 0.000989367, 0.000989993]
About to get variable in execute output
node_1:1 result shape: [2, 8]; dtype: FLOAT; first values [4.00551e-12, 3.32392e-13, 6.67214e-12, 5.39405e-12, 4.80502e-13, 1.28904e-12, 8.27144e-12, 3.329e-12, 7.66411e-13, 2.23717e-13, 3.17307e-17, 9.08269e-12, 6.35366e-12, 5.99964e-12, 8.6584e-13, 9.78665e-13]
About to get variable in execute output
node_1:2 result shape: [2, 8]; dtype: FLOAT; first values [6.32895e-06, 1.82317e-06, 8.16837e-06, 7.34447e-06, 2.19205e-06, 3.59034e-06, 9.09481e-06, 5.76979e-06, 2.76843e-06, 1.49573e-06, 1.78132e-08, 9.53038e-06, 7.97104e-06, 7.74579e-06, 2.94253e-06, 3.12838e-06]
Executing op: [subtract]
About to get variable in execute output
node_1:0 result shape: [2, 8]; dtype: FLOAT; first values [0.633023, 0.181556, 0.816297, 0.733506, 0.220933, 0.358406, 0.908977, 0.576345, 0.273101, 0.147188, 0.000102085, 0.951392, 0.79244, 0.770769, 0.29114, 0.308904]
Debug info for node_2 input[0]; shape: [32, 6, 37]; ews: [1]; order: [f]; dtype: [FLOAT]; first values: [0.240036, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0]
Debug info for node_2 input[1]; shape: [2, 28, 8]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [0.707904, 0.48191, 0.530009, 0.975718, 0.441828, 0.551604, 0.748229, 0.491702, 0.406794, 0.116409, 0.425937, 0.65109, 0.713855, 0.605524, 0.37033, 0.693307]
Debug info for node_2 input[2]; shape: [2, 2, 8]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [0.477823, 0.712924, 0.794264, 0.31212, 0.651737, 0.997345, 0.846167, 0.0832938, 0.199648, 0.334917, 0.0688696, 0.962268, 0.212686, 0.444508, 0.939427, 0.395516]
Removing variable <1:0>
Removing variable <1:1>
Removing variable <1:2>
Removing variable <1:3>
Executing op: [lstmLayer]
Error at [/home/runner/work/deeplearning4j/deeplearning4j/libnd4j/include/ops/declarable/generic/nn/recurrent/lstmLayer.cpp:226:0]:
LSTM_LAYER operation: wrong shape of input weights, expected is [2, 37, 8], but got [2, 28, 8] instead !
Exception in thread “main” java.lang.RuntimeException: Op with name lstmLayer and op type [lstmLayer] execution failed with message Op validation failed
at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:1905)
at org.nd4j.linalg.factory.Nd4j.exec(Nd4j.java:6554)
at org.nd4j.autodiff.samediff.internal.InferenceSession.doExec(InferenceSession.java:801)
at org.nd4j.autodiff.samediff.internal.InferenceSession.getOutputs(InferenceSession.java:255)
at org.nd4j.autodiff.samediff.internal.TrainingSession.getOutputs(TrainingSession.java:163)
at org.nd4j.autodiff.samediff.internal.TrainingSession.getOutputs(TrainingSession.java:45)
at org.nd4j.autodiff.samediff.internal.AbstractSession.output(AbstractSession.java:533)
at org.nd4j.autodiff.samediff.internal.AbstractSession.output(AbstractSession.java:154)
at org.nd4j.autodiff.samediff.internal.TrainingSession.trainingIteration(TrainingSession.java:129)
at org.nd4j.autodiff.samediff.SameDiff.fitHelper(SameDiff.java:1936)
at org.nd4j.autodiff.samediff.SameDiff.fit(SameDiff.java:1792)
at org.nd4j.autodiff.samediff.SameDiff.fit(SameDiff.java:1732)
at org.nd4j.autodiff.samediff.config.FitConfig.exec(FitConfig.java:172)
at org.nd4j.autodiff.samediff.SameDiff.fit(SameDiff.java:1712)
at org.deeplearning4j.examples.quickstart.modeling.recurrent.LocationNextNeuralNetworkV6.sameDiff(LocationNextNeuralNetworkV6.java:861)
at org.deeplearning4j.examples.quickstart.modeling.recurrent.LocationNextNeuralNetworkV6.main(LocationNextNeuralNetworkV6.java:199)

CODE

    SameDiff sd = SameDiff.create();
    Map<String,INDArray> placeholderData = new HashMap<>();

    //Properties for dataset:
    int nIn = 6;
    int nOut = 2;
    int miniBatchSize = 32;

    while(trainData.hasNext()) {
        placeholderData = new HashMap<>();
        DataSet t = trainData.next();

        System.out.println(" Printing traindata feature and label dataset shape");
        System.out.println(Arrays.toString(t.getFeatures().shape()));
        System.out.println(Arrays.toString(t.getLabels().shape()));

        INDArray features = t.getFeatures();
        INDArray labels = t.getLabels();
        placeholderData.put("input", features);
        placeholderData.put("label", labels);

        long dim0 = t.getFeatures().size(0);
        long dim1 = t.getFeatures().size(1);
        long dim2 = t.getFeatures().size(2);

        System.out.println(" features - dim0 - "+dim0);
        System.out.println(" features - dim1 - "+dim1);
        System.out.println(" features - dim2 - "+dim2);

        //Create input and label variables
        SDVariable input = sd.placeHolder("input", DataType.FLOAT, dim0, dim1, dim2);
        SDVariable label = sd.placeHolder("label", DataType.FLOAT, miniBatchSize, 4);

        LSTMLayerConfig mLSTMConfiguration = LSTMLayerConfig.builder()
                .lstmdataformat(LSTMDataFormat.NTS)
                .directionMode(LSTMDirectionMode.BIDIR_CONCAT)
                .gateAct(LSTMActivations.SIGMOID)
                .cellAct(LSTMActivations.SOFTPLUS)
                .outAct(LSTMActivations.SOFTPLUS)
                .retFullSequence(true)
                .retLastC(false)
                .retLastH(false)
                .build();

        LSTMLayerOutputs outputs = new LSTMLayerOutputs(sd.rnn.lstmLayer(
                input,
                LSTMLayerWeights.builder()
                        .weights(sd.var("weights", Nd4j.rand(DataType.FLOAT, 2, dim2, 4 * nOut)))
                        .rWeights(sd.var("rWeights", Nd4j.rand(DataType.FLOAT, 2, nOut, 4 * nOut)))
                        .bias(sd.var("bias", Nd4j.rand(DataType.FLOAT, 2, 4 * nOut)))
                        .build(),
                mLSTMConfiguration), mLSTMConfiguration);

// Behaviour with default settings: 3d (time series) input with shape
// [miniBatchSize, vectorSize, timeSeriesLength] → 2d output [miniBatchSize, vectorSize]
SDVariable layer0 = outputs.getOutput();

        SDVariable layer1 = layer0.mean(1);

        SDVariable w1 = sd.var("w1", new XavierInitScheme('c', nIn, nOut), DataType.FLOAT,  4, 4);
        SDVariable b1 = sd.var("b1", Nd4j.rand(DataType.FLOAT, 1));

        SDVariable out = sd.nn.softmax("out", layer1.mmul(w1).add(b1));

        //Create and set the training configuration
        double learningRate = 1e-3;
        TrainingConfig config = new TrainingConfig.Builder()
                .l2(1e-4)                               //L2 regularization
                .updater(new Adam(learningRate))        //Adam optimizer with specified learning rate
                .dataSetFeatureMapping("input")         //DataSet features array should be associated with variable "input"
                .dataSetLabelMapping("label")           //DataSet label array should be associated with variable "label"
                .build();

        sd.setTrainingConfig(config);

        System.out.println(" Printing sd information");
        System.out.println(sd.toString());
        System.out.println(sd.summary());

        //Perform training for 2 epochs
        int numEpochs = 2;
        sd.fit(trainData, numEpochs);

        //Evaluate on test set:
        String outputVariable = "softmax";
        Evaluation evaluation = new Evaluation();
        sd.evaluate(testData, outputVariable, evaluation);

        //Print evaluation statistics:
        System.out.println(evaluation.stats());

    }

@adonnini that’s much closer to what I’d expect to be reasonable. It looks like your weights are defined as 2,37,8 when they should be 2,28,8 as in the error message. Did you try to fix that?

Thanks for the feedback. I am encouraged.

Fixing the weights shape issue is pretty tricky (at least for me)
because shape of the input variable changes and the shape of weights
needs to change correspondingly.

This portion of the error log

Debug info for node_2 input[0]; shape: [32, 6, 37]; ews: [1]; order:
[f]; dtype: [FLOAT]; first values: [0.240036, -0, -0, -0, -0, -0, -0,
-0, -0, -0, -0, -0, -0, -0, -0, -0]
Debug info for node_2 input[1]; shape: [2, 28, 8]; ews: [1]; order: [c];
dtype: [FLOAT]; first values: [0.707904, 0.48191, 0.530009, 0.975718,
0.441828, 0.551604, 0.748229, 0.491702, 0.406794, 0.116409, 0.425937,
0.65109, 0.713855, 0.605524, 0.37033, 0.693307]

shows that while input shape has changed to [32, 6, 37], weights shape
is still [2, 28, 8]

I need to figure out how to update weights shape when input shape changes.

By the way, how do you update the value of an SDVariable or variables
like LSTMLayerOutputs or mLSTMConfiguration?

When I tried it throws an error stating something like “variable has
already been defined” (from what I remember)?

Thanks

A related question.

I need to perform a training run on a single dataset. sd.fit works only for iterators, right? is sd.output the one to use in order to perform a training run on a single dataset?

Thanks

Is there an operation/method to reduce the size of a dimension of an INDArray. More specifically,

I have one array with shape [32, 2, 57] and a second array with shape [32, 6, 57]

I want to reduce dim1 of the second to 2 retaining the first two columns.

Do I need to write code manually to do this or is there a pre-built operation/method I can use?

Thanks

shows that while input shape has changed to [32, 6, 37], weights shape
is still [2, 28, 8]

I need to figure out how to update weights shape when input shape changes.

By the way, how do you update the value of an SDVariable or variables
like LSTMLayerOutputs or mLSTMConfiguration?

When I tried it throws an error stating something like “variable has
already been defined” (from what I remember)?


That happens when you call fit normally. We old a reference to a variable with an associated array.

Regarding the shapes, weights in general do not change shape. This is true even in dl4j.

That’s just not how neural networks work.

You should not be changing things manually yourself. Sorry I’d need more context on what you’re trying to do there.

I need to perform a training run on a single dataset. sd.fit works only for iterators, right? is sd.output the one to use in order to perform a training run on a single dataset?

Fit works on datasets:

https://github.com/deeplearning4j/deeplearning4j/blob/b8e9d4f157bab565e22bcf9f804fe6d2613036da/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/autodiff/samediff/SameDiff.java#L1777

You have to set the training config so it can match values to variables though.

I have one array with shape [32, 2, 57] and a second array with shape [32, 6, 57]

I want to reduce dim1 of the second to 2 retaining the first two columns.

I would recommend shrink or something similar for this. Otherwise in the next version (should be out within this week or next week) there is also the newer variable slicing API. It depends what you’re trying to do. Whether that’s getting a subset of an array or reshaping it.

"

Regarding the shapes, weights in general do not change shape. This is
true even in dl4j.

That’s just not how neural networks work.

"
However, if the input dataset shape changes, then if weights shape does
not change correspondingly, there is a shape mismatch.

What am I missing here?

Or is masking used when defining the training/testing data iterator
supposed to take care of that?

Basically, my problem is that when executing operations like matmul dim1
of weights needs to be equal to dim2 of input and dim2 of input changes
from one dataset to the next.

The only way I have been able to make some progress is by looping (while
hasNext()) through the iterator and attempt to adjust weights shape at
each loop cycle.

In this way, if I read the logs correctly, lstmlayer, matmul, and
softmax operations are completed successfully. logloss is not. This is
where I am currently stuck.

As you know, logloss has two inputs: output (the result of all previous
operations starting with lstmlayer and ending with softmax) and label.

output has shape [minibatchSize, nIn, X) as it should (I think). label,
by definition has shape [minibatchSize, nOut, X]

in my relatively little experience, with dl4j I would be able to specify
nIn and nOut at each stage as needed to make sure that they match with
the previous and next layer. when using samediff I don’t know/understand
what the equivalent mechanism is.