Methods for dropping out inputs (features) during post-fit Evaluation?

Hi everyone, I heard this is a good place to post questions. I am working on a new approach and will try to describe below.

So right now I am working with standard feedforward MultiLayerNetworks, using dropout for regularization and building several noisy models this way (saving all seeds and resulting models). However, after training, I want to do scoring (Evaluation) with some of the input features dropped out, rather than just use the drop out technique during fit optimization. The idea is to use Monte Carlo simulations to find important inputs(features) and correlations between inputs. I have a system with limited number of examples and far too many features right now: n = 9000 and m = 1200.

I am looking through the documentation and source code (I have changed Nd4j before and got something merged into the main repo for PCA in the past) and am having trouble seeing what API methods and classes would enable this in DL4j, which is much bigger. I looked through the Layer class (it would probably be most efficient to alter the model in-place, since MC sampling involves making changes to the inputs and then accepting/rejecting changes based on my criteria - the Evaluation itself is very cheap) and have also looked at TransferLearning documentation, but I could not find anything there either, and I don’t want to do any refitting but just implement my random walker that disconnects/reconnects nodes with acceptance/rejection criteria based on evaluations.

Does anyone know a way to effectively disconnect the nodes within layers after model.fit training and before Evaluation? I’m assuming it has to be in the code somewhere and I only need it in a native x86 CPU architecture.

Thank you!

As you’ve got a simple feedforward / dense network, the approach should be about as simple too.

Remember that a Dense Layer is essentially y = Wx + b, with x being the input vector, W being the weight matrix, b being the bias vector and y being the output vector.

If you want to pretend that x is missing some entries you can do that by either zeroing those specific values or by zeroing the weights associated with them. The easiest way to approach that would be to call .output(inputs, false, inputMask, null) which will apply the input mask to your inputs to effectively mask out what you want removed.

Or if you don’t like this approach from a fundamental perspective (zero may be a valid value after all!), you can drop the values in the input vector and drop the associated weights from the weight matrix.

To modify a weight matrix, you can use .getParam and .setParam on your model.

Or if you just want to enable dropout and other training time regularization for your output, you can use .output(input, true).

Thanks for the reply. That is extremely helpful! It looks like I need to getParam(“0_W”) and drop a column in the INDArray and then setParam it over, then drop the corresponding row in the input matrix for that feature, and then I should have an extremely fast evaluator for MC sampling. I agree with you, 0 does have a value in the model so this should make the node effectively “gone” more than setting the inputs to zero.

Thanks!

Here is the Exception I threw when trying to reduce the size of “0_W” on the line model.setParam("0_W", W)
(where W removed half of the rows, and I had already reduced the .getFeatures() INDArray matrix too, but never got to the Evaluation lines)

Exception in thread "main" java.lang.IllegalStateException: Cannot assign arrays: arrays must both be scalars, both vectors, or shapes must be equal other than size 1 dimensions. Attempting to do x.assign(y) with x.shape=[1021, 180] and y.shape=[511, 180]

I tried .allowInputModification(true) on Layer 0, but I am still getting this error.
:frowning:

I am now trying to edit the DL4J source code, because of these errors. It looks like if BaseLayer.java didn’t use assign in its setParam method when the key already exist, I wouldn’t get this IllegalStateException error, so I changed it so that it uses put(key, value) whether or not the key already exists. model.allowInputModification(true) just throws an error that Input Modification is not implemented yet. @treo does that make sense?

That is a bad idea, considering what you are trying to do.

The exception you are seeing stems from the fact that dl4j is using one continuous piece of memory for the weights, and you are ripping holes into it.

Honestly, I thought you’d go with the zeroing approach, as it mathematically does the exactly same thing, but is probably a lot faster.

If you still want to continue down the path of changing the actual weight matrix, you’ll have to dig a bit deeper.

What you’d need to do additionally, is to use the Transfer Learning API and use nInReplace first to change the size of the weights (it will be randomly initialized) and then after building the model from there, you can actually replace the new random weights with the submatrix that you want to use.

Hi @treo , no, I tried the inputMask you suggested first. I tried setting the inputMask from both the model and layer0, but they both seem to be the wrong dimensions, it wants 180 rows (that’s the number of nodes in the first hidden layer that the features connect to in one of my models) and not 1021 (the number of features in all my models), so it does not seem to support masking out the features, maybe it’s returning the first hidden layer because that’s the first one it supports?

Thank you for your help, though, I can zero the rows that I wanted to remove and that would probably get around this. It does seem like mathematically it is the same thing. I’ll try it out and if that doesn’t work I’ll have to keep thinking…

image
(from Multiplying matrices and vectors - Math Insight)

Lets say that x is the input and A is the weights. If you want to remove the effect of any input from the total, setting it to zero results in the multiplication to be zero and therefore to not change the result of the sum.

It does indeed have the same effect, so it should work, and it doesn’t really matter if you zero your inputs or the appropriate weight row. But zeroing the inputs makes more sense.