Missing SD variable issue

adonnini · August 14, 2023, 12:10pm

Hi,

I completed the initial implementation of the “Attention Is All You Need” model and have started debugging training/testing.

I perform sd.fit one dataset at a time.
Processing of the first dataset completes without errors.
Processing of the second dataset fails to complete execution with the error reported below.

A search for information on this error failed to produce any results.

What do you think is going on? Any suggestions as to what I should do next?

Thanks

Exception in thread "main" java.lang.IllegalStateException: SameDiff instance does not have a variable with name "sd_var_70"
	at org.nd4j.common.base.Preconditions.throwStateEx(Preconditions.java:639)
	at org.nd4j.common.base.Preconditions.checkState(Preconditions.java:301)
	at org.nd4j.autodiff.samediff.internal.TrainingSession.trainingIteration(TrainingSession.java:95)
	at org.nd4j.autodiff.samediff.SameDiff.fitHelper(SameDiff.java:1936)
	at org.nd4j.autodiff.samediff.SameDiff.fit(SameDiff.java:1792)
	at org.nd4j.autodiff.samediff.SameDiff.fit(SameDiff.java:1660)
	at org.deeplearning4j.examples.trajectorypredictiontransformer.LocationNextNeuralNetworkV7_04.sameDiff3(LocationNextNeuralNetworkV7_04.java:492)
	at org.deeplearning4j.examples.trajectorypredictiontransformer.LocationNextNeuralNetworkV7_04.main(LocationNextNeuralNetworkV7_04.java:193)

treo · August 14, 2023, 12:13pm

Try giving your SDVariables explicit names. That way you’ll get a better clue as to what may be tre problem.

adonnini · August 14, 2023, 12:57pm

Thanks. I do that as much as I can. However, when I do that I run into a
different problem. SD variable names already in use. I try to obviate
that issue by giving SD variable “name”+“random number” names. However,
even that does not always work.

So, at this point I have used explicit variable names as much as
possible without running into the “already in use” issue.

Regardless, what are some of the possible cause of the “does not have a
variable with name” issue?

Thanks

treo · August 14, 2023, 1:22pm

That suggests that you are doing something weird, like re-declaring parts of the graph.

Given that fit on the first dataset works but doesn’t on the second one, suggests that you may be doing something that (re-)defines something based on the input shapes.

Are you doing that?

adonnini · August 14, 2023, 7:58pm

The transformer model from “Attention Is All You Need” (see attached)
includes multiple instances of certain modules
I could be wrong but SameDiff does not let you do that (easily).

For example, in order to implement the model I need to have three
multi-head attention modules

Given my limited knowledge of SameDiff, my solution is to have three
multi-head attention modules (with different names).

The model also includes multiple instances of normalization,
FeedForward and positional encoding modules

The model is composed of multiple layers (in the paper, six).

As you can imagine, this meant that the number of variables in my
implementation attempt was big(ger).

After defining the model, I run it executing sd.fit one dataset at a time.

As I reported, sd.fit, fails with “SameDiff instance does not have a
variable with name” error.

When I loop through the datasets, the argument for sd.fit is the dataset
in the next iteration of the loop not the one in the current iteration.
I do this because if the argument is the dataset in the current
iteration of the loop, execution fails with
“labels and predictions arrays must have the same shapes, but got [32,
2, 33] and [32, 2, 14] correspondingly !”

I am not sure I answered your question. Please let me know if I did not
in which case I did not quite understand it.

treo · August 15, 2023, 5:46am

No, you didn’t really answer the question, but the other problem you run into also suggests that you’re doing something unexpected.

With SameDiff you define the entire computation once. That means it must be defined in such a way that it will work with arbitrary sized inputs and it must not be dependent on the particular shape of things, unless it will be the exact shape in every single input.

As for naming things even when there are many parts to it: A simple hierarchical approach would work great here. For example the name of the last Normalization in the 6th decoder layer could be “Decoder_6_Norm_3”.

adonnini · August 15, 2023, 10:40am

Thanks very much for the feedback. I’ll have to look at computation
definition vs. execution issue I have. I like your naming convention idea.
Thanks.

agibsonccc · August 15, 2023, 11:47am

@adonnini note you can also do sd.withNameScope for variables as well. Variables will automatically be prefixed.

adonnini · August 15, 2023, 6:14pm

@agibsonccc I took a look at nameScope tests and tried to use it in my code.

It works well. Thanks!

One question. Once a nameScope is defined, it applies to all variables created after its definition unless a new nameScope is defined. Is this correct?

Thanks

agibsonccc · August 15, 2023, 9:43pm

@adonnini the scope is only applicable within the block.

Topic		Replies	Views
Reusing SD variables SameDiff	5	194	July 31, 2023
Fail to get batch output from sd.math.embeddingLookup SameDiff	4	351	October 6, 2022
Using gradient as an intermediate SDVariable SameDiff	11	392	June 14, 2022
NullPointerException org.nd4j.autodiff.samediff.internal.Variable.getOutputOfOp() SameDiff	111	290	July 6, 2025
Modifying SD variables SameDiff	9	226	July 20, 2023

Missing SD variable issue

Related topics