Stacking c copies of a (1,m,n) layer into a (c,m,n) tensor, inside network

EquanimeAugello · June 23, 2023, 8:31am

Hello,
is there a way to stack c copies of a (1,m,n) layer (2D) into a (c,m,n) tensor, inside the network? I have a (1,m,n) layer L1 and I want to do an ElementWise addition with the output layer of a Convolution, L2, that is (c,m,n), but the shapes are clearly not compatible, so I’d like to to produce a tensor of L1’s with right shape. This is happening in the hidden layers of a network so I couldn’t do that manually on the input.
Thank you in advance

agibsonccc · June 23, 2023, 11:28am

@EquanimeAugello you’ll want a merge vertex. The output of each layer should be the same dimension.

Note that CNN output is actually going to be 4d not 3d. Do you mean ou want a CNN 1d?

EquanimeAugello · July 9, 2023, 5:10pm

Thank you very much. Ah yes you are right, my CNN is 2D so 4d output, my mistake. I have then a doubt, since I should merge those c instances of the same layer ( with c a variable that can change in different instances of the network) but so far I only know how to merge only an explicit number of layers manually; could you kindly suggest a reference for how to do it?

For clarity I append here the code that configures the network I’m working on:

 public ResNetBrain(int boardSize, int nResidualBlocks){
        int arrayArea= (int) Math.pow(boardSize*3,2);
        ComputationGraphConfiguration conf = customResBlocks(
                new NeuralNetConfiguration.Builder()
                    .weightInit(WeightInit.XAVIER)
                    .updater(new Sgd(0.01))
                    //.updater(new Adam(0.01))
                    .graphBuilder()
                    .setInputTypes(InputType.convolutional(boardSize*3, boardSize*3, 1))
                    .addInputs("input"), 
                boardSize, 
                nResidualBlocks //è il numero di ripetizioni del residual block
            )
            .addLayer("penultimo", new DenseLayer.Builder().nOut(arrayArea).build(), "A_"+nResidualBlocks+"_3")
            .addLayer("BN_penultimo", new BatchNormalization.Builder().nOut(boardSize).build(), "penultimo")
            .addLayer("ultimo", new DenseLayer.Builder().nOut(1)
                    //.gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
                    .build(),"BN_penultimo")
            .addLayer("output", new OutputLayer.Builder()
                    .lossFunction(new PessimisticLossFunction())
                    .activation(Activation.TANH)
                    .nIn(4).nOut(1).build(), "ultimo")
            .setOutputs("output")
            .build();
        this.net=new ComputationGraph(conf);
        this.net.init();
        
        Nd4j.getExecutioner().setProfilingConfig(ProfilerConfig.builder()
                .checkForINF(true)
                .checkElapsedTime(true)
                .checkLocality(true)
                .checkWorkspaces(true)
                .build());
    }
    public GraphBuilder customResBlocks(GraphBuilder previousArchitecture, int boardSize, int nResidualBlocks){ //il vertice/layer di input si deve chiamare "input" perchè funzioni
        GraphBuilder resBlocks=previousArchitecture;
        resBlocks=resBlocks
            .addLayer("A_0_3", new BatchNormalization.Builder().build(), "input");
        for(int i=1;i<=nResidualBlocks; i++){
            resBlocks=resBlocks
            .addLayer(i+"_1", new ConvolutionLayer.Builder()
                        .kernelSize(3,3)
                        .padding(1,1)
                        .nIn(1)
                        //Note that nIn need not be specified in later layers
                        .stride(1,1)
                        .nOut(boardSize)
                        //.activation(Activation.LEAKYRELU)
                        .build(),"A_"+(i-1)+"_3" )
            .addLayer("BN_"+i+"_1", new BatchNormalization.Builder().nOut(boardSize).build(), i+"_1")
            .addLayer("A_"+i+"_1", new ActivationLayer(Activation.TANH), "BN_"+i+"_1" )
            .addLayer(i+"_2", new ConvolutionLayer.Builder()
                        .kernelSize(3,3)
                        .padding(1,1)
                        .nIn(boardSize)
                        //Note that nIn need not be specified in later layers
                        .stride(1,1)
                        .nOut(boardSize)
                        //.activation(Activation.LEAKYRELU)
                        .build(),"A_"+i+"_1" )
            .addLayer("BN_"+i+"_2", new BatchNormalization.Builder().nOut(boardSize).build(), i+"_2")
            
            .addLayer("A_tiled_"+(i-1)+"_3", new ConvolutionLayer.Builder() //This tiles the 2D input into a boardsize deep tensor, with different weights (of the (1,1) filters) that acts as different weight in the ElementWise sum between each channel with the residual 2d input
                        .kernelSize(1,1)
                        .nIn(1)
                        //Note that nIn need not be specified in later layers
                        .stride(1,1)
                        .nOut(boardSize)
                        //.activation(Activation.LEAKYRELU)
                        .build(),"A_"+(i-1)+"_3" )
            .addLayer("BN_A_tiled_"+(i-1)+"_3", new BatchNormalization.Builder().nOut(boardSize).build(), "A_tiled_"+(i-1)+"_3")
            
            .addVertex(i+"_3", new ElementWiseVertex(ElementWiseVertex.Op.Add), "BN_"+i+"_2", "BN_A_tiled_"+(i-1)+"_3")
            //.addVertex(i+"_3", new MergeVertex(), "BN_"+i+"_2", "A_"+(i-1)+"_3")
            .addLayer("A_"+i+"_3", new ActivationLayer(Activation.TANH), i+"_3" );
        }
        return resBlocks;
    }

Please note that right now I have a (1x1 kernels) convolutional layer in place of the merge layer I should implement, since I thought that temporarily it should produce a layer of right shape although multiplied by arbitrary weights. I just did that for preliminary testing of the rest of the network.

Thank you sincerely,
Equanime

agibsonccc · July 10, 2023, 4:56am

@EquanimeAugello could you clarify your question a bit? Are you not sure how to make everything the same sahpe and are looking for a path to that? That’s the only way a valid merge can happen.

EquanimeAugello · July 11, 2023, 7:57pm

Hello,
thank you for the reply,
my main doubt is how to get from one layer to multiple (c) copies of it, and then stack (merge) them all, since I only know how to merge 2 layers in a ComputationGraph. Thank you very much in advance,
Equanime

agibsonccc · July 15, 2023, 9:21pm

@EquanimeAugello sorry forgot to follow up here.

Could you clarify by what you mean with multiple copies? Repeat might do what you want but I’m not sure. Do you mean n copies of the same ndarray followed by a merge?

The RepeatVector layer should do what you’re looking for without all the complicated steps instead.

Topic		Replies	Views
MergeVertex behaviour with both Convolutional and Regular layers DL4J	1	167	June 23, 2023
Shared Weights Embedding Layer (Stack / Unstack) DL4J	0	297	July 2, 2021
How to "Add" multiple layers in a CNN architecture?	3	510	September 17, 2021
How can I program this neural network architecture in Deeplearning4j? DL4J	4	1165	September 18, 2020
Reshape/Flatten layer support? DL4J	1	170	February 28, 2024

Stacking c copies of a (1,m,n) layer into a (c,m,n) tensor, inside network

Related topics