Inference in M2.1 is not idempotent and gives wrong results comparing to M1.1

agibsonccc · September 13, 2022, 1:41pm

Thanks! This helps a ton. When I was saying “cold” I was thinking of the state of the cache where the samediff instance hasn’t touched those arrays yet and view creation or other side effects aren’t possible. Let me think on this reproduction a bit.

partarstu · September 13, 2022, 3:02pm

Got it. I meant actually the state of SameDiff instance itself, but what you’re saying explains why the immediate after model’s load inference (no fit() before) has for the first round (no cache yet) different results comparing to the next rounds (with cache already created).

However regarding the difference between running the use case above and the one with fit() being involved before, I think that having a child instance of SameDiff plays some role here, not only cache itself

agibsonccc · September 14, 2022, 6:11am

One cause I’m thinking of is the inputs/outputs being closed and reused. Could you try calling setCloseable(false) on all the various arrays going in and out of your network to see if that has any effect?
This is one theory I have: Disables caching of arrays that are inputs/outputs allowing proper use by agibsonccc · Pull Request #9771 · deeplearning4j/deeplearning4j · GitHub

It might be inputs being modified causing results to be different. I made changes at all execution levels to prevent anything passed in or being used as outputs from being put in the cache.

agibsonccc · September 14, 2022, 7:55am

@partarstu I merged these changes to master. It fixed some closing issues for me when attempting to run training. I would appreciate some feedback to see if it works for you.

partarstu · September 14, 2022, 9:07am

I’ll try to do that later today and will give you a feedback. Just to double-check: you mean inputs (placeholders) and outputs of eval() - right?

I’ll also try to use the latest snapshot and test if the issue still persists. Not sure that everything will go smoothly though, because last time I used a snapshot I had a couple of issues

agibsonccc · September 14, 2022, 9:08am

@partarstu yes basically anything input in to the network. JFYI I merged that PR and pushed the branch to snapshots already. Give that a shot as is first and if not keep an eye out for the behavior for arrays that get reused.

partarstu · September 15, 2022, 11:55am

@agibsonccc Just tried it out with SNAPSHOT - the issue remains. What I’ve noticed is that your fix with PR 9771 is not being invoked while running sd.getVariable(MODEL_OUTPUT_VAR_NAME).eval()

I’ll try to do that manually in my code and see what happens.

UPDATE: I set all input arrays with setCloseable(false), the output one as well. Made no difference. Anyway I create for each “cold” inference new arrays based on input data and I run eval() after that each time. Seems like making them not closable doesn’t change the situation.

agibsonccc · September 15, 2022, 12:31pm

@partarstu ok…so it’s not input/output data that’s still being stored somewhere. I’m asking that because anything marked closeable means it doesn’t get stored for caching and thus recycled. I’ll double check where else that could be occurring.

partarstu · September 15, 2022, 2:32pm

@agibsonccc , I’ve also noticed that the outputs of my self-attention layers are different after the first “cold” inference and the next ones. It means that there are some modifications already in the pipeline, the weights are obviously affected as well.

agibsonccc · September 15, 2022, 10:45pm

@partarstu you’re saying the feed forward modifies the weights?

agibsonccc · September 15, 2022, 10:53pm

@partarstu This is another guess: Ensures views aren't returned from cache by agibsonccc · Pull Request #9777 · deeplearning4j/deeplearning4j · GitHub

I’m still convinced this is view related. Your attention layer has permutes and reshapes in there as well.

partarstu · September 16, 2022, 10:21am

Can’t confirm that, but something’s being modified definitely. I’ve just re-run the “cold” inference having the freshest SNAPSHOT libs and token embeddings with attention weights frozen - I still see different outputs starting with the second attention layer. So the first layer provides more or less the same output during each inference, but the second and the next layers show different outputs for the first and subsequent inferences. Which means something happens after the first attention’s layer softmax output and either weights or the results of some operation(s) are modified.

partarstu · September 16, 2022, 10:27am

What is also interesting is that every time I run the inference after fresh JVM restart I get the same results, e.g. the same model outputs (the first inference results which differ from the ones during the next rounds). That kind of idempotence means that there’s a fixed logic somewhere which is involved somewhere after the first attention layer’s softmax output (attention scores which I log onto the console)

agibsonccc · September 16, 2022, 10:49am

@partarstu try running it later. I just merged the relevant PR and am pushing a snapshot build now: .github/workflows/build-deploy-cross-platform.yml · Workflow runs · deeplearning4j/deeplearning4j · GitHub

partarstu · September 16, 2022, 10:59am

@agibsonccc , Got it.

Topic		Replies	Views
The Loss of the SameDiff converges to the same value SameDiff	3	24	August 6, 2024
Problems using SameDiff? Read this first! SameDiff	4	1170	November 16, 2021
Training imported model SameDiff	0	245	March 7, 2022
Reproducibility question DL4J	4	359	May 29, 2021
Custom SameDiff ops missing from graph after upgrade to M2 SameDiff	2	233	April 14, 2022

Inference in M2.1 is not idempotent and gives wrong results comparing to M1.1

Related topics