I’ve been taking a look at this and as you suggested have overridden the *doDiff* in *TensorMmul* to use the

existing C++ implementation in *libnd4j/include/ops/declarable/generic/blas/tensormmul.cpp*.

This has been a partial success. The issue I have now is that, from what I can see, the implementation of *CUSTOM_OP_IMPL(tensormmul_bp)* is not working for all cases. Specifically it will generally error out if

the ranks of the the two input tensors differ. The forward pass is fine however in those cases. From

adding some extra *nd4j_verbose* calls in my own fork of the libnd4j code it appears that the exception is being raised by shapeUtils when called from the section

*// calculate dLdA*

*MmulHelper::tensorDot(dLdC, B, dLdA, axesBdLdC, axesB, permutAt);*

A *“ShapeUtils::evalShapeForTensorDot method: the numbers of a axes and b axes to make dot product along must have identical values !”* runtime error will be thrown at this point.

To test further I used TensorFlow from python to generate a series of random test cases for input tensors A and B of different ranks, making sure that they were sized so that I could contract over at least one index, and dumping the resulting tensordot output and the A,B grads to .npy files and loading them back into ND4J and comparing the results with what I was getting from my overridden *tensormMul* op which uses the C++ implementation for both the forward and backward passes. What I have so far, taking all combinations of tensor ranks for A and B from 2 to 6 is:-

Output from ScalaTest. The input shapes are in the square brackets and I have contracted over the second last index in all cases (though that is easy to change or randomise)

**Pass**

[info] a=[2,1], b=[2,1] Calc pass Grad pass

[info] a=[2,1], b=[3,2,1] Calc pass Grad pass

[info] a=[1,4,4], b=[2,4,4] Calc pass Grad pass

[info] a=[1,4,4], b=[1,4,4,4] Calc pass Grad pass

[info] a=[2,3,2,2], b=[1,3,2,2] Calc pass Grad pass

[info] a=[2,3,2,2], b=[1,1,4,2,2] Calc pass Grad pass

[info] a=[4,4,4,4,2], b=[4,1,3,4,2] Calc pass Grad pass

[info] a=[4,4,4,4,2], b=[2,2,2,1,4,2] Calc pass Grad pass

[info] a=[3,3,2,1,4,3], b=[4,2,4,2,4,3] Calc pass Grad pass

**Fail**

[info] a=[2,1], b=[2,3,2,1] Calc pass Grad crash

[info] a=[2,1], b=[1,2,1,2,1] Calc pass Grad crash

[info] a=[2,1], b=[3,3,1,1,2,1] Calc pass Grad crash

[info] a=[1,4,4], b=[4,4] Calc pass Grad crash

[info] a=[1,4,4], b=[1,1,1,4,4] Calc pass Grad crash

[info] a=[1,4,4], b=[3,3,4,4,4,4] Calc pass Grad crash

[info] a=[2,3,2,2], b=[2,2] Calc pass Grad crash

[info] a=[2,3,2,2], b=[2,2,2] Calc pass Grad crash

[info] a=[2,3,2,2], b=[2,2,4,3,2,2] Calc pass Grad crash

[info] a=[4,4,4,4,2], b=[4,2] Calc pass Grad crash

[info] a=[4,4,4,4,2], b=[2,4,2] Calc pass Grad crash

[info] a=[4,4,4,4,2], b=[3,2,4,2] Calc pass Grad crash

[info] a=[3,3,2,1,4,3], b=[4,3] Calc pass Grad crash

[info] a=[3,3,2,1,4,3], b=[2,4,3] Calc pass Grad crash

[info] a=[3,3,2,1,4,3], b=[2,4,4,3] Calc pass Grad crash

[info] a=[3,3,2,1,4,3], b=[3,3,3,4,3] Calc pass Grad crash

The good news is that the forward pass (“Calc”) passes for all cases. The grads agree when they don’t error out as described above, but for most cases the grads will not calculate. All the cases where A and B are of the same rank do work, but this should not be a requirement in the general case. I note that all the tests of *tensormul_bp* in

*libnd4j/tests_cpu/layers_tests/DeclarableOpsTests15.cpp*

only consider cases where the rank of A and B are the same, so this would not have been picked up by those tests.

If this is genuinely an issue and I haven’t got the wrong idea, let me know what would be useful. I can raise an issue, put some code up in a gist etc. Given that TensorFlow implements a dense layer in terms of tensordot (I believe), not being able to backprop it would seem to be potentially problematic, not just for my somewhat esoteric use case.