Inference in M2.1 is not idempotent and gives wrong results comparing to M1.1

Thanks! This helps a ton. When I was saying “cold” I was thinking of the state of the cache where the samediff instance hasn’t touched those arrays yet and view creation or other side effects aren’t possible. Let me think on this reproduction a bit.

Got it. I meant actually the state of SameDiff instance itself, but what you’re saying explains why the immediate after model’s load inference (no fit() before) has for the first round (no cache yet) different results comparing to the next rounds (with cache already created).

However regarding the difference between running the use case above and the one with fit() being involved before, I think that having a child instance of SameDiff plays some role here, not only cache itself

One cause I’m thinking of is the inputs/outputs being closed and reused. Could you try calling setCloseable(false) on all the various arrays going in and out of your network to see if that has any effect?
This is one theory I have: Disables caching of arrays that are inputs/outputs allowing proper use by agibsonccc · Pull Request #9771 · deeplearning4j/deeplearning4j · GitHub

It might be inputs being modified causing results to be different. I made changes at all execution levels to prevent anything passed in or being used as outputs from being put in the cache.

@partarstu I merged these changes to master. It fixed some closing issues for me when attempting to run training. I would appreciate some feedback to see if it works for you.

I’ll try to do that later today and will give you a feedback. Just to double-check: you mean inputs (placeholders) and outputs of eval() - right?

I’ll also try to use the latest snapshot and test if the issue still persists. Not sure that everything will go smoothly though, because last time I used a snapshot I had a couple of issues :slightly_smiling_face:

@partarstu yes basically anything input in to the network. JFYI I merged that PR and pushed the branch to snapshots already. Give that a shot as is first and if not keep an eye out for the behavior for arrays that get reused.

@agibsonccc Just tried it out with SNAPSHOT - the issue remains. What I’ve noticed is that your fix with PR 9771 is not being invoked while running sd.getVariable(MODEL_OUTPUT_VAR_NAME).eval()

I’ll try to do that manually in my code and see what happens.

UPDATE: I set all input arrays with setCloseable(false), the output one as well. Made no difference. Anyway I create for each “cold” inference new arrays based on input data and I run eval() after that each time. Seems like making them not closable doesn’t change the situation.

@partarstu ok…so it’s not input/output data that’s still being stored somewhere. I’m asking that because anything marked closeable means it doesn’t get stored for caching and thus recycled. I’ll double check where else that could be occurring.

@agibsonccc , I’ve also noticed that the outputs of my self-attention layers are different after the first “cold” inference and the next ones. It means that there are some modifications already in the pipeline, the weights are obviously affected as well.

@partarstu you’re saying the feed forward modifies the weights?

@partarstu This is another guess: Ensures views aren't returned from cache by agibsonccc · Pull Request #9777 · deeplearning4j/deeplearning4j · GitHub

I’m still convinced this is view related. Your attention layer has permutes and reshapes in there as well.

Can’t confirm that, but something’s being modified definitely. I’ve just re-run the “cold” inference having the freshest SNAPSHOT libs and token embeddings with attention weights frozen - I still see different outputs starting with the second attention layer. So the first layer provides more or less the same output during each inference, but the second and the next layers show different outputs for the first and subsequent inferences. Which means something happens after the first attention’s layer softmax output and either weights or the results of some operation(s) are modified.

What is also interesting is that every time I run the inference after fresh JVM restart I get the same results, e.g. the same model outputs (the first inference results which differ from the ones during the next rounds). That kind of idempotence means that there’s a fixed logic somewhere which is involved somewhere after the first attention layer’s softmax output (attention scores which I log onto the console)

@partarstu try running it later. I just merged the relevant PR and am pushing a snapshot build now: .github/workflows/build-deploy-cross-platform.yml · Workflow runs · deeplearning4j/deeplearning4j · GitHub

@agibsonccc , Got it.