I completed the initial implementation of the “Attention Is All You Need” model and have started debugging training/testing.
sd.fit one dataset at a time.
Processing of the first dataset completes without errors.
Processing of the second dataset fails to complete execution with the error reported below.
A search for information on this error failed to produce any results.
What do you think is going on? Any suggestions as to what I should do next?
Exception in thread "main" java.lang.IllegalStateException: SameDiff instance does not have a variable with name "sd_var_70"
Try giving your SDVariables explicit names. That way you’ll get a better clue as to what may be tre problem.
Thanks. I do that as much as I can. However, when I do that I run into a
different problem. SD variable names already in use. I try to obviate
that issue by giving SD variable “name”+“random number” names. However,
even that does not always work.
So, at this point I have used explicit variable names as much as
possible without running into the “already in use” issue.
Regardless, what are some of the possible cause of the “does not have a
variable with name” issue?
That suggests that you are doing something weird, like re-declaring parts of the graph.
Given that fit on the first dataset works but doesn’t on the second one, suggests that you may be doing something that (re-)defines something based on the input shapes.
Are you doing that?
The transformer model from “Attention Is All You Need” (see attached)
includes multiple instances of certain modules
I could be wrong but SameDiff does not let you do that (easily).
For example, in order to implement the model I need to have three
multi-head attention modules
Given my limited knowledge of SameDiff, my solution is to have three
multi-head attention modules (with different names).
The model also includes multiple instances of normalization,
FeedForward and positional encoding modules
The model is composed of multiple layers (in the paper, six).
As you can imagine, this meant that the number of variables in my
implementation attempt was big(ger).
After defining the model, I run it executing sd.fit one dataset at a time.
As I reported, sd.fit, fails with “SameDiff instance does not have a
variable with name” error.
When I loop through the datasets, the argument for sd.fit is the dataset
in the next iteration of the loop not the one in the current iteration.
I do this because if the argument is the dataset in the current
iteration of the loop, execution fails with
“labels and predictions arrays must have the same shapes, but got [32,
2, 33] and [32, 2, 14] correspondingly !”
I am not sure I answered your question. Please let me know if I did not
in which case I did not quite understand it.
No, you didn’t really answer the question, but the other problem you run into also suggests that you’re doing something unexpected.
With SameDiff you define the entire computation once. That means it must be defined in such a way that it will work with arbitrary sized inputs and it must not be dependent on the particular shape of things, unless it will be the exact shape in every single input.
As for naming things even when there are many parts to it: A simple hierarchical approach would work great here. For example the name of the last Normalization in the 6th decoder layer could be “Decoder_6_Norm_3”.
Thanks very much for the feedback. I’ll have to look at computation
definition vs. execution issue I have. I like your naming convention idea.
@adonnini note you can also do sd.withNameScope for variables as well. Variables will automatically be prefixed.
@agibsonccc I took a look at nameScope tests and tried to use it in my code.
It works well. Thanks!
One question. Once a nameScope is defined, it applies to all variables created after its definition unless a new nameScope is defined. Is this correct?
@adonnini the scope is only applicable within the block.