Implement Dot Product layer

Hi, I am trying to implement a dot product layer. I have gotten to a point where I am able to do an element-wise multiplication. After that, if I understand correctly, I will have to add an activation layer that can sum all the columns (i.e. reduce along dimension 1). I am trying the below code but it doesn’t work as expected and it just outputs the input provided

    ComputationGraph model;
            ComputationGraphConfiguration conf;

            NeuralNetConfiguration.Builder confBuilder = new NeuralNetConfiguration.Builder();
            // ---------------------

            // ---------------------
            // layering
            ComputationGraphConfiguration.GraphBuilder graphBuilder = confBuilder.graphBuilder();
            graphBuilder.addInputs("layer_input_");
            graphBuilder.setInputTypes(InputType.inferInputType(Nd4j.ones(1, 6)));
            graphBuilder.addVertex("layer_output_Betas_", new SubsetVertex(0, 2), "layer_input_");
            graphBuilder.addVertex("layer_output_F_",     new SubsetVertex(3, 5), "layer_input_");

            graphBuilder.addLayer("denselayer",
                    new DenseLayer.Builder().nIn(3).nOut(1).activation(Activation.IDENTITY)
                            .build(),
                    "layer_output_Betas_");
            // dot product of betas and factors
            graphBuilder.addVertex("product_vertex",
                    new ElementWiseVertex(ElementWiseVertex.Op.Product), "layer_output_Betas_", "layer_output_F_");
            graphBuilder.addLayer("layer_product_",
                    new ActivationLayer.Builder().activation(Activation.IDENTITY).build(), "product_vertex");
            graphBuilder.addLayer("layer_dot", new ActivationLayer.Builder().activation(new SumActivation()).build(), "layer_product_");
            //        graphBuilder.addLayer("layer_dot_",
    //                new GlobalPoolingLayer.Builder().poolingDimensions(1).poolingType(PoolingType.SUM).build(), "layer_product_");
    //        graphBuilder.inputPreProcessor("layer_dot_", new FeedForwardToRnnPreProcessor());
            graphBuilder.addLayer("layer_output_", new OutputLayer.Builder(new LossMSE())
                    .nIn(1)
                    .nOut(1)
                    .activation(new ActivationIdentity())
                    .build(), "layer_dot_");
            graphBuilder.setOutputs("layer_output_", "denselayer");

SumActivation.java


    public class SumActivation extends BaseActivationFunction {
        @Override
        public INDArray getActivation(INDArray in, boolean training) {
            Sum op = new Sum(in, 1);
            Nd4j.getExecutioner().execAndReturn(op);
    //        return op.getZ();
            return in;
        }

        @Override
        public Pair<INDArray, INDArray> backprop(INDArray in, INDArray epsilon) {
            assertShape(in, epsilon);
            Sum op = new Sum(in, 1);
            Nd4j.getExecutioner().execAndReturn(op);
            return new Pair<>(in, null);
        }
    }

The execAndReturn method updates the input NDArray for other activations but over here it doesn’t update the input array. Also, if I return a op.getZ() it throws an error. Any ideas what am I doing incorrectly here?

A few things to note here:

Why are you using it that way anyway? Why not just return in.sum(1)?

But there are more problems in your approach than that. Activation functions are expected to be element-wise, so abusing an Activation Layer for that purpose isn’t going to work.

I see in the commented out region, that you are have tried using GlobalPoolingLayer, which might work, but I guess that you will have to play around with the dimensions to find the correct parameters to actually get what it should be.

Overall though the approach is a bit convoluted to begin with though. It would be a whole lot easier to use a SameDiffVertex and define your layer that way.

The definition would look something like this (not tested, you might have to set up the dimensions for dot correctly) :

class DotVertex extends SameDiffVertex {
        @Override
        public void defineParametersAndInputs(SDVertexParams params) {
            params.defineInputs("a", "b");
        }

        @Override
        public InputType getOutputType(int layerIndex, InputType... vertexInputs) throws InvalidInputTypeException {
            return InputType.InputTypeFeedForward(1);
        }

        @Override
        public SDVariable defineVertex(SameDiff sameDiff, Map<String, SDVariable> layerInput, Map<String, SDVariable> paramTable, Map<String, SDVariable> maskVars) {
            return layerInput.get("a").dot(layerInput.get("b"));
        }

        @Override
        public GraphVertex clone() {
            return new DotVertex();
        }

        @Override
        public void validateInput(INDArray[] input) {}

        @Override
        public void initializeParameters(Map<String, INDArray> params) {}
    }

@treo Thanks for your response. I tried creating a custom vertex but that doesn’t work either. I get an error like;

Exception in thread "main" org.deeplearning4j.exception.DL4JInvalidInputException: Input that is not a matrix; expected matrix (rank 2), got rank 1 array with shape [3]. Missing preprocessor or wrong input type? (layer name: layer_output_, layer index: 6, layer type: OutputLayer)
at org.deeplearning4j.nn.layers.BaseLayer.preOutputWithPreNorm(BaseLayer.java:306)
at org.deeplearning4j.nn.layers.BaseLayer.preOutput(BaseLayer.java:289)
at org.deeplearning4j.nn.layers.BaseLayer.activate(BaseLayer.java:337)
at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(LayerVertex.java:111)
at org.deeplearning4j.nn.graph.ComputationGraph.outputOfLayersDetached(ComputationGraph.java:2380)
at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1741)
at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1697)
at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1822)
at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1808)
at com.axegine.evan.poc.foo.DotProductTest.main(DotProductTest.java:113)

I think the issue above is that the output of dot is an array and hence the rank is 1. Is there a way I can force the output to be a 2D matrix?

I also tried the first recommendation and it gives me the following error;

Exception in thread "main" java.lang.IllegalStateException: Feed forward (inference): array (ACTIVATIONS) workspace validation failed (vertex layer_dot_ - class: ActivationLayer) - array is defined in incorrect workspace
	at org.deeplearning4j.nn.graph.ComputationGraph.validateArrayWorkspaces(ComputationGraph.java:1888)
	at org.deeplearning4j.nn.graph.ComputationGraph.outputOfLayersDetached(ComputationGraph.java:2404)
	at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1741)
	at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1697)
	at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1822)
	at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1808)
	at com.axegine.evan.poc.foo.DotProductTest.main(DotProductTest.java:99)
Caused by: org.nd4j.linalg.workspace.ND4JWorkspaceException: Array workspace validation failed: Array of type ACTIVATIONS should be in workspace "WS_LAYER_ACT_3" but is in workspace "WS_LAYER_WORKING_MEM"
	at org.nd4j.linalg.workspace.BaseWorkspaceMgr.validateArrayLocation(BaseWorkspaceMgr.java:238)
	at org.deeplearning4j.nn.workspace.LayerWorkspaceMgr.validateArrayLocation(LayerWorkspaceMgr.java:86)
	at org.deeplearning4j.nn.graph.ComputationGraph.validateArrayWorkspaces(ComputationGraph.java:1879)
	... 6 more

I guess that is exactly because of not setting dimensions.

You’ll need to pass the dimensions so it applies dot on each example individually. See Eclipse Deeplearning4j · GitHub…-

@treo I was setting the dimensions but it appears that I need to reshape after the output of dot since the dot function returns a vector (rank 1) and we need a 2d matrix.

public SDVariable defineVertex(SameDiff sameDiff, Map<String, SDVariable> layerInput, Map<String, SDVariable> paramTable, Map<String, SDVariable> maskVars) {
        final SDVariable inputA = layerInput.get("a");
        long numRows = inputA.getShape()[0];
        return layerInput.get("a").dot(layerInput.get("b"),1).reshape(numRows, 1);
    }

When defining a samediff vertex you should never do that, as defineVertex is only called once on the very first iteration this value may be wrong if you don’t have equal batches all the way to the end.

Use expandDims instead here.