Issues about modifying the source code

Hi all, due to my research purpose, I need to split the forwarding and backwarding phase in training into two separate functions, which is all done in one function (i.e. fit() ) in DL4J. Can anyone give me some advices on how to achieve my idea? Sincerely appreciate for your help!

@fubuki you can either try using samediff (the lower level api with more control) or use dl4j’s external errors to do your own backpropagation.
In that case, you can call the gradient functions on your own.

Otherwise for the dl4j api ,a test example can be found here:

Thank you for your advice! For the first solution, do you mean that I can only use samediff api to achieve me idea without using DL4J?

@fubuki generally samediff is meant for lower level control of graphs similar to tensorflow/pytorch. You can look at:

Take a look at our quickstart as well:

It’s up to you which route to go. Samediff is superceding dl4j as the main api long term though. Dl4j’s computation graph and multi layernetwork do have this functionality built in though via external errors as mentioned above. Samediff is capable of external errors as well though.

You may find more here:

I got it. Thank you for your guidance!

Hi, after referring to the github link you shared, I am still confused about the backward process in Samediff. Generally, for the backward process, it uses the forward output to perform the back propagation and updates the weights of the model using the gradient. I am not sure which the function / step is that updates the weights of the model.

@fubuki you would use external errors in combination with the fit function in that case. You can also use calculate gradients. You can configure a training configuration similar to other training examples after specifying external errors in the other examples listed.

I would try it out. Thank you for your help!

Hi, if I use calculateGradients() to get the gradients, should I update each variable by writing some codes like “var.get(‘w’).subi(grad.mul(lr))” ? Or is there any function I can use to update the variables?

@fubuki You usually use use update like that yes.

Got it. Thank you a lot!

Hi! When I use the calculateGradients() and ExternalErrorFunction to execute the backward, it fails and reports that “No array was provided for required placeholder variable “input””. But I have already set values in the “input” variable. How should I handle this? Thank you!

@fubuki that is saying that no gradient for input has been calculated not that you didn’t set the value for input. It sounds like there is an issue here somewhere. Could you file an issue with a reproducer for me to look at? Generally placeholders shouldn’t need gradients though. I will need to look at this in more detail.

It’s hard to tell if there is a bug here or not. Thanks!

What about providing you with the source code in Github (Test function included)? I will make some comments on the code and tell you what my program is trying to do. Thank you!

@fubuki sure happy to review.

@agibsonccc Here is the source code link: GitHub - fubukishiro/FTPipeHD_MC
The error occurs in MNISTCNNTest/subModelTrainTest() function, which is in the line 118 of MNISTCNNTest.java. I made a TODO comment on that line.
Forgive me to reuse some code from the SameDiff source code.
Sincerely thank you for your help!

@agibsonccc What my program want to do is to split a model into three sub-models. And it first forwards these three sub-models in a consecutive order and backwards them in a reverse order. For instance, the output of sub-model1 is the input of the sub-model2. The gradient of the input of subModel2 is the external error of sub-model1.

@agibsonccc Hi, have you figured out any issue in my implementation? Thanks!

@fubuki sorry haven’t had bandwidth to do that. Assume anytime you ask me to look at source code it’s going to involve me cloning a repository, potentially correcting some code, me testing against multiple versions (potentially M1.1 and snapshots) in case I find issues. It’s not just a “scan the repo and magically find the problem”

These things are usually at least a 1-2 hour affair that requires dedicated time. I’ll try to get back to you early next week after I cleared my backlog out. If we’re lucky earlier.

@agibsonccc I totally understand that you are busy now. I will wait for you finishing your backlog. I have been always appreciating your help in my issue!