"Attention Is All You Need" model implementation using dl4j

agibsonccc · June 6, 2023, 11:34am

@adonnini could you clarify what you’re looking for with masking? Masking is used with dynamic time step sizes. It adds padding that allows for a static set of weights to be compatible with different sizes. Masks are just variables you use in the context of samediff. We explain masking here: Recurrent Layers - Deeplearning4j

adonnini · June 6, 2023, 11:48am

I thought it would be applicable to my problem where step size (dim2) is dynamic.

However, I have no idea how to use masking although I did read and understood the explanation in the documentation.

My only experience with masking is using the

SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END

parameter

when defining the training and testing dataset iterators

adonnini · June 6, 2023, 12:43pm

I read the explanation of masking in the documentation again. I am confused by the reference to figures and arrays in this section

In any event, it does not help in figuring out how to pad an SDVariable. I also read the documentation about the pad function. Where could I look to see how it is done?

agibsonccc · June 6, 2023, 8:58pm

@adonnini there is a pad op for that:

sd.nn().pad(…) - padding is mainly used with CNNs though. Could you elaborate on what you’re trying to do exactly? Are you trying to setup your datasets with a mask? Honestly without running your code it’s kind of hard to see what you’re doing.

adonnini · June 7, 2023, 5:23am

I mentioned masking/padding only as a guess about something I could use to resolve my problem.

To recap:

Using samediff I am trying to replicate the one-layer LSTM based regression network I implemented using dl4j.
My starting point was this code:

github.com

deeplearning4j/deeplearning4j/blob/e2b92619b299e13a181791bf5ecd53304cb393e0/platform-tests/src/test/java/org/eclipse/deeplearning4j/integration/testcases/samediff/SameDiffRNNTestCases.java#L103


      
          }
          
          

          
@Override
          public ModelType modelType() {
              return ModelType.SAMEDIFF;
          }
          
          

          
@Override
          public Object getConfiguration() throws Exception {
              Nd4j.getRandom().setSeed(12345);
          
          

          
    int miniBatchSize = 10;
              int numLabelClasses = 6;
              int nIn = 60;
              int numUnits = 7;
              int timeSteps = 3;

In my input dataset shape is variable, specifically dim2 changes from one dataset to the next
If my understanding is correct, since weights shape needs to be compatible with input shape, and input dataset shape is variable, to address this issue I loop through the sataset iterator containing my input datasets and build my network from scratch for each dataset, and perform a training run for each dataset
The approach described in 4) has been working up to a point. If I read the logs correctly, all operations work except for logloss where label shape and output shape are not compatible. Specifically, dim1 is different (in message 39 above I explained in detail what the issue is)

Please let me know what information you need. I would be glad to post my entire code for you to run.

I would welcome any suggestions to the approach I described above. If I should use different cpode as the starting point for my implementation, please let me know.

Thanks.

adonnini · June 8, 2023, 6:55am

I am running into this error:

Op [pad] failed check for input [1], DataType: [INT16]

Here is my code:

        SDVariable padding = sd.constant("padding", Nd4j.zeros(DataType.INT16, 32,4,dim2));
        SDVariable labelPadded = sd.nn.pad("labelPadded", label, padding, PadMode.CONSTANT, 0.0);
        SDVariable loss = sd.loss.logLoss("loss", labelPadded, out);

The error seems to point to the constant padding. I am not sure what’s wrong with it.

What did I do wrong?

Thanks

agibsonccc · June 8, 2023, 11:31am

@adonnini based on your inputs there the labels seem like they’re mismatched? Could you clarify what the expected label shape with the labels and what you get without padding?

adonnini · June 8, 2023, 1:56pm

Hi Adam,

The input data set-up is exactly the same as it was for the networks
implemented with dl4j. Nothing changed.

Given this, there is no mismatch between feature files and label files.

Feature file shape is [batchSize, nIn, number of rows]
Label file shape is [batchSize, nOut, number of rows]

This has always been the case and has not caused any problems whatsoever
when implementing networks with dl4j.

When attempting to use samediff to implement the simplest of the
networks I implemented with dl4j, the mismatch occurs after processing
of lstmlayer when attempting to produce the loss.

Specifically, logloss has two inputs:

softmax output with shape [batchSize, nIn, number of rows]
label with shape [batchSize, nOut, number of rows]

Here is the error message:

LOG_LOSS_GRAD OP: labels and predictions arrays must have the same shapes, but got [32, 2, 57] and [32, 6, 57] correspondingly !

I would be glad to upload the code for you to run, it you have the time
and think it would help.

adonnini · June 9, 2023, 5:40am

@agibsonccc would the suggestion you made in

apply in my case? May I ask what you meant by “make that a placeholder”? I am not sure I understand.

Thanks

treo · June 9, 2023, 6:01am

Can you share both your original dl4j setup and your current samediff setup (ideally in a self-contained repository that we can run)?

This thread has gotten awfully long without much progress, and it seems that by not having the actual code you are using, @agibsonccc can only really guess and give vague answers.

And you on the other side try out a lot of things, but that also muddies the waters for anyone trying to understand what is going on.

So I think it is going to be most helpful if we all take a step back, get a statement of what exactly you are trying to do now (the what, not the how!) and the code that goes along with it.

adonnini · June 9, 2023, 12:13pm

Thanks.

My goal is o duplicate all the networks I implemented using DL4J with SameDiff. Ultimately, I would like to implement a transformer model for my application (location tracking).

In

https://github.com/ActionConsulting/NeuralNetwork

I created two folders:

DL4J
SameDiff

Folder DL4J contains the code to create, train and test the regression network I created using DL4J.

Folder SameDiff contains the code which is my attempt at duplicating the regression network created using DL4J.

Both folders also contain the raw input csv data file which typically is in src/main/assets. In both applications the path to the raw data file is
String gpsTrackPointsFilePath = “src/main/assets/neuralNetworkDataSet.csv”;

You should be able to run both code instances. Please let me know if you encounter any problems or have any questions.

Adam has access to the repository. I just invited you too.

Thanks for your offer to help. I really appreciate it

treo · June 9, 2023, 12:35pm

Unfortunately that is not actually a runnable project. Even after transplanting things into the examples project (which it looks like you are working in) and putting the assets folder where it looks like they should be in, your supposedly working DL4J version fails to run.

So again, please provide a clean self-contained repository that we can use to actually run your code.

Start from the mvn-project-template: deeplearning4j-examples/mvn-project-template at master · deeplearning4j/deeplearning4j-examples · GitHub

Why do we insist that you provide something that we can just clone and run?

Because in the 20 minutes I just spent trying to figure out why the thing you provided doesn’t work at all, I could have instead stepped through your code to see if my intuition of what is wrong is correct and provide you with a more helpful answer.

adonnini · June 9, 2023, 2:22pm

I completely understand. I am sorry you wasted your time.

I confess my ignorance, having never done this. I don’t normally work on github.

So, I uploaded the entire project as a zip file to the repository you have access to.

The zip archive name is

mvn-project-template.zip

You can download, extract it and access it using IntelliJ IDEA

I have run both the dl4j and samediff versions from IntelliJ IDEA without problems.

I am sorry if this is not exactly what you are looking for. If this is too time consuming, I understand.

treo · June 9, 2023, 3:32pm

There are quite a lot of things that you should do different.

Naming things dim0, dim1, dim2 is one of them. The problem here is that those names are meaningless. And because of that they are confusing.

DataVec’s default RNN tensor layout is [miniBatchSize, featureCount, timesteps] and for outputs it is [miniBatchSize, labelCount, timesteps].

You have chosen to use [miniBatchSize, timesteps, featureCount] option in your LSTM config. So those things already don’t go well with each other.

It would be quite easily apparent if you had chosen a better naming scheme for your variables.

Next, you go on and create more complication for yourself by using a bi-directional LSTM. When you’re still struggling to figure out the appropriate shapes of things, going one directional is a lot better, as the shapes for that are a lot simpler.

Next, you are trying to apply a simple matrix multiplication in a batched multi-step context. Depending on what you want to do, there are several different options here, but I"m not going to go int this here now.

Because there are so many layers to your problems, I suggest you start over but with none of the complexity you’ve integrated here.

All of the things you want to do have an immediate mode execution equivalent.

Instead of using SameDiff.rnn you can use Nd4j.rnn, instead of using variables, you can just directly instantiate things.

Instead of loading your files, you can just hard code an example.

That way you can step through your code with a debugger and just see what the effect of everything is immediately.

That will help you figure out the shapes that you need and the approach to the calculation that you want.

Then once you’ve figured out those things switch to SameDiff, add your data loading code and add a trainer. Otherwise there will be just too many things that can go wrong.

As you are still learning, try to learn only one thing at at time. Us humans are terrible at multi-tasking.

adonnini · June 9, 2023, 4:51pm

Thanks very much for the advice and “roadmap” to follow. It looks like I
have quite a bit of work to do!
I hope it’s not a problem if I use this thread in case I have any
questions or get stuck.

One very quick question before I get started, just to make sure I am on
the right track when it comes to shapes.

In a feature file where the number of columns is featureCount, the
number of rows is timesteps, right?

Thanks.

treo · June 9, 2023, 5:04pm

Yes.

Maybe start a new thread. This one is quite messy. Ideally, every thread should become one more resource for someone to help themselves.

Topic		Replies	Views
Transformer/Attention-NLP	4	3450	November 22, 2020
Attention Layer DL4J	4	557	December 27, 2020
Implementation of Uncertainty-aware Attention mechanism	1	386	September 10, 2021
Basic deeplearning4j classification example DL4J	4	996	February 3, 2020
Attention and Pooling Problem with Merge on Backpropagation DL4J	9	780	October 30, 2021

"Attention Is All You Need" model implementation using dl4j

Related topics