I’ve been taking a look at this and as you suggested have overridden the doDiff in TensorMmul to use the
existing C++ implementation in libnd4j/include/ops/declarable/generic/blas/tensormmul.cpp.
This has been a partial success. The issue I have now is that, from what I can see, the implementation of CUSTOM_OP_IMPL(tensormmul_bp) is not working for all cases. Specifically it will generally error out if
the ranks of the the two input tensors differ. The forward pass is fine however in those cases. From
adding some extra nd4j_verbose calls in my own fork of the libnd4j code it appears that the exception is being raised by shapeUtils when called from the section
// calculate dLdA
MmulHelper::tensorDot(dLdC, B, dLdA, axesBdLdC, axesB, permutAt);
A “ShapeUtils::evalShapeForTensorDot method: the numbers of a axes and b axes to make dot product along must have identical values !” runtime error will be thrown at this point.
To test further I used TensorFlow from python to generate a series of random test cases for input tensors A and B of different ranks, making sure that they were sized so that I could contract over at least one index, and dumping the resulting tensordot output and the A,B grads to .npy files and loading them back into ND4J and comparing the results with what I was getting from my overridden tensormMul op which uses the C++ implementation for both the forward and backward passes. What I have so far, taking all combinations of tensor ranks for A and B from 2 to 6 is:-
Output from ScalaTest. The input shapes are in the square brackets and I have contracted over the second last index in all cases (though that is easy to change or randomise)
Pass
[info] a=[2,1], b=[2,1] Calc pass Grad pass
[info] a=[2,1], b=[3,2,1] Calc pass Grad pass
[info] a=[1,4,4], b=[2,4,4] Calc pass Grad pass
[info] a=[1,4,4], b=[1,4,4,4] Calc pass Grad pass
[info] a=[2,3,2,2], b=[1,3,2,2] Calc pass Grad pass
[info] a=[2,3,2,2], b=[1,1,4,2,2] Calc pass Grad pass
[info] a=[4,4,4,4,2], b=[4,1,3,4,2] Calc pass Grad pass
[info] a=[4,4,4,4,2], b=[2,2,2,1,4,2] Calc pass Grad pass
[info] a=[3,3,2,1,4,3], b=[4,2,4,2,4,3] Calc pass Grad pass
Fail
[info] a=[2,1], b=[2,3,2,1] Calc pass Grad crash
[info] a=[2,1], b=[1,2,1,2,1] Calc pass Grad crash
[info] a=[2,1], b=[3,3,1,1,2,1] Calc pass Grad crash
[info] a=[1,4,4], b=[4,4] Calc pass Grad crash
[info] a=[1,4,4], b=[1,1,1,4,4] Calc pass Grad crash
[info] a=[1,4,4], b=[3,3,4,4,4,4] Calc pass Grad crash
[info] a=[2,3,2,2], b=[2,2] Calc pass Grad crash
[info] a=[2,3,2,2], b=[2,2,2] Calc pass Grad crash
[info] a=[2,3,2,2], b=[2,2,4,3,2,2] Calc pass Grad crash
[info] a=[4,4,4,4,2], b=[4,2] Calc pass Grad crash
[info] a=[4,4,4,4,2], b=[2,4,2] Calc pass Grad crash
[info] a=[4,4,4,4,2], b=[3,2,4,2] Calc pass Grad crash
[info] a=[3,3,2,1,4,3], b=[4,3] Calc pass Grad crash
[info] a=[3,3,2,1,4,3], b=[2,4,3] Calc pass Grad crash
[info] a=[3,3,2,1,4,3], b=[2,4,4,3] Calc pass Grad crash
[info] a=[3,3,2,1,4,3], b=[3,3,3,4,3] Calc pass Grad crash
The good news is that the forward pass (“Calc”) passes for all cases. The grads agree when they don’t error out as described above, but for most cases the grads will not calculate. All the cases where A and B are of the same rank do work, but this should not be a requirement in the general case. I note that all the tests of tensormul_bp in
libnd4j/tests_cpu/layers_tests/DeclarableOpsTests15.cpp
only consider cases where the rank of A and B are the same, so this would not have been picked up by those tests.
If this is genuinely an issue and I haven’t got the wrong idea, let me know what would be useful. I can raise an issue, put some code up in a gist etc. Given that TensorFlow implements a dense layer in terms of tensordot (I believe), not being able to backprop it would seem to be potentially problematic, not just for my somewhat esoteric use case.