Error using tensorMmul on permuted tensor

Hi there. I’m building a SameDiffVertex and have run into an issue where calling tensorMmul on a tensor which has previously been permuted causes a NPE in reduce.TensorMul.doDiff when calculating gradients

java.lang.NullPointerException: org.nd4j.linalg.api.ops.impl.reduce.TensorMmul.doDiff(TensorMmul.java:146)

which seems to be:-

int aAxes = range(0, larg().getShape().length);

I have tried a test case where I just use an sd.constant as the first arg to tensorMmul and try it with and without permuting it first, and the error goes away if I don’t do the permute. Unfortunately this is not an option in my real world case. Is it possible these two ops don’t play well together or am I doing something dumb somewhere?

It looks like that operation is still using an old java based backprop implementation, and it may be that there is a problem.

Without the code that you’ve tried it is hard to tell what exactly is going on though.

Thanks for quick response. Understood. It’s probably a showstopper for me unless there is a workaround. Let me try to pull a minimal gist together (will be in scala if that’s OK) since it looks like it might be a bug, but in brief if I have (with weight an appropriately shaped matrix):-

val testInputNotPermuted = sd.constant(“test_input_not”, Nd4j.ones(5, 3, 4, 7))
sd.tensorMmul(“output”, testInputNotPermuted, weight, Array(3), Array(0), false, false, false)

the grads will calc using sd.calculateGradients. But if I do:-

val testInputPermuted = sd.constant(“test_input”, Nd4j.ones(5, 7, 4, 3)).permute(0, 3, 2, 1)
sd.tensorMmul(“output”, testInputPermuted, weight, Array(3), Array(0), false, false, false)

I will get the NPE when I call calculateGradients. The forward pass seems correct in both cases however.

It is possible to work around it in a similar way to how the workaround in this case worked:

However, you’ve got to understand how the native op works. If I get the opportunity, I’ll write up a workaround for this case too.

That would be great! Is the issue likely to be with sd.permute or sd.tensorMmul? I am guessing it’s tensorMmul as I’m pretty sure I’ve have tried mmul (rather than tensorMmul) on permuted tensors and it has been fine. Somewhat beyond my current understanding of the codebase unfortunately.

Would you like me to create an issue for this or wait until you have had a further look?

I’ve been taking a look at this and as you suggested have overridden the doDiff in TensorMmul to use the
existing C++ implementation in libnd4j/include/ops/declarable/generic/blas/tensormmul.cpp.
This has been a partial success. The issue I have now is that, from what I can see, the implementation of CUSTOM_OP_IMPL(tensormmul_bp) is not working for all cases. Specifically it will generally error out if
the ranks of the the two input tensors differ. The forward pass is fine however in those cases. From
adding some extra nd4j_verbose calls in my own fork of the libnd4j code it appears that the exception is being raised by shapeUtils when called from the section

// calculate dLdA

MmulHelper::tensorDot(dLdC, B, dLdA, axesBdLdC, axesB, permutAt);

A “ShapeUtils::evalShapeForTensorDot method: the numbers of a axes and b axes to make dot product along must have identical values !” runtime error will be thrown at this point.

To test further I used TensorFlow from python to generate a series of random test cases for input tensors A and B of different ranks, making sure that they were sized so that I could contract over at least one index, and dumping the resulting tensordot output and the A,B grads to .npy files and loading them back into ND4J and comparing the results with what I was getting from my overridden tensormMul op which uses the C++ implementation for both the forward and backward passes. What I have so far, taking all combinations of tensor ranks for A and B from 2 to 6 is:-

Output from ScalaTest. The input shapes are in the square brackets and I have contracted over the second last index in all cases (though that is easy to change or randomise)

Pass
[info] a=[2,1], b=[2,1] Calc pass Grad pass
[info] a=[2,1], b=[3,2,1] Calc pass Grad pass
[info] a=[1,4,4], b=[2,4,4] Calc pass Grad pass
[info] a=[1,4,4], b=[1,4,4,4] Calc pass Grad pass
[info] a=[2,3,2,2], b=[1,3,2,2] Calc pass Grad pass
[info] a=[2,3,2,2], b=[1,1,4,2,2] Calc pass Grad pass
[info] a=[4,4,4,4,2], b=[4,1,3,4,2] Calc pass Grad pass
[info] a=[4,4,4,4,2], b=[2,2,2,1,4,2] Calc pass Grad pass
[info] a=[3,3,2,1,4,3], b=[4,2,4,2,4,3] Calc pass Grad pass

Fail
[info] a=[2,1], b=[2,3,2,1] Calc pass Grad crash
[info] a=[2,1], b=[1,2,1,2,1] Calc pass Grad crash
[info] a=[2,1], b=[3,3,1,1,2,1] Calc pass Grad crash
[info] a=[1,4,4], b=[4,4] Calc pass Grad crash
[info] a=[1,4,4], b=[1,1,1,4,4] Calc pass Grad crash
[info] a=[1,4,4], b=[3,3,4,4,4,4] Calc pass Grad crash
[info] a=[2,3,2,2], b=[2,2] Calc pass Grad crash
[info] a=[2,3,2,2], b=[2,2,2] Calc pass Grad crash
[info] a=[2,3,2,2], b=[2,2,4,3,2,2] Calc pass Grad crash
[info] a=[4,4,4,4,2], b=[4,2] Calc pass Grad crash
[info] a=[4,4,4,4,2], b=[2,4,2] Calc pass Grad crash
[info] a=[4,4,4,4,2], b=[3,2,4,2] Calc pass Grad crash
[info] a=[3,3,2,1,4,3], b=[4,3] Calc pass Grad crash
[info] a=[3,3,2,1,4,3], b=[2,4,3] Calc pass Grad crash
[info] a=[3,3,2,1,4,3], b=[2,4,4,3] Calc pass Grad crash
[info] a=[3,3,2,1,4,3], b=[3,3,3,4,3] Calc pass Grad crash

The good news is that the forward pass (“Calc”) passes for all cases. The grads agree when they don’t error out as described above, but for most cases the grads will not calculate. All the cases where A and B are of the same rank do work, but this should not be a requirement in the general case. I note that all the tests of tensormul_bp in
libnd4j/tests_cpu/layers_tests/DeclarableOpsTests15.cpp
only consider cases where the rank of A and B are the same, so this would not have been picked up by those tests.

If this is genuinely an issue and I haven’t got the wrong idea, let me know what would be useful. I can raise an issue, put some code up in a gist etc. Given that TensorFlow implements a dense layer in terms of tensordot (I believe), not being able to backprop it would seem to be potentially problematic, not just for my somewhat esoteric use case.

Hi, yes please file an issue. Thanks!

That’s done. Gist at:-