Assign value to a placeholder

I suspected as much. It just makes it difficult to verify that the various operations (initial definition, permutation and reshaping produce the expected shapes.

This is just a guess. Would using eval applying it to a variable work to get the information I am looking for?

Thanks

Yes, you should be able to eval it and get the shape that way.

those operations are not in-place, so you’ll need to actually output their results.

Also the parameters to permute and reshape are quite wrong.

Check the docs:
permute
reshape

@adonnini that means the permute axes were invalid.
One recommendation I have is to understand what each function does. I feel like you’re not putting much effort there (or at least that’s not coming through in your posts)

I know you’re trying but I encourage you to try to understand the error messages. You won’t make a lot of progress without that. You’ve been here long enough that I think you should at least be able to understand what some of the ops do (eg: a matrix multiply needs columns on the left to be equal to rows on the right) or what some of the more basic ops like permute do.

99% of the time you’ll be dealing with shape errors. That’s how you build networks.

With that in mind, I’d like you to understand what permute is and does. You pass in valid values (0 to 32) 1 for each dimension (in this case only up to 3) to rearrange the shape of the variable in some way. That’s what the value here is telling you.

You shouldn’t be passing in timesteps or…really anything like what you’re passing to permute for the reason I described above.

Those all should be dimensions. Try to rewrite your permute to reflect that.

Fix that first then we can deal with infinite/nan. That usually comes from some sort of underflow or reading invalid values from an input.

You are right. Let me work on making sure I understand how permute works in this context.
Thanks to you both for your help and your patience

Also note that what I’m saying here are not the actual parameters to the methods. They are the resulting shapes.

That is why I told you to read the documentation again. Permute takes a list of axis indicies to shuffle them around, and reshape then takes actual sizes to reinterpret the data as a differently shaped tensor.

I did and it finally registered that permute takes indices not values

1 Like

My biggest trouble all along has been not in understanding how matrix/algebra operations work but understanding the nomenclature/terminology used in dl4j/nd4j and how algebra/matrix operations are called in dl4j/nd4j. Add to that that, understandably given the massive volume of information, some of the documentation contains errors.

I make all of this worse by not taking the time to really understand how you have implemented matrix/algebra operations. I am trying to slow down.

I have corrected the embarrassing error I made in my first attempt to use permute. Now, the permute/reshape pairing works.

Now, when running the code there are two problems:

  1. After about half a dozen datasets, all values turn to nan

  2. When processing the 17th dataset, for some unknown (to me) reason, the value of dim0 (miniBatchSize) changes from 32 to 22 and execution fails.

Below, I posted the latest update of the code, and the last portion of the log.

Please let me know what you think is going on and what I should be doing next.

Thanks.

 Printing sd information
--- Summary ---
Variables:               17                   (13 with arrays)
Functions:               8                   
SameDiff Function Defs:  0                   
Loss function variables: [loss]

--- Variables ---
- Name -     - Array Shape -     - Variable Type -   - Data Type-        - Output Of Function -  - Inputs To Functions -
b1           [2]                 VARIABLE            FLOAT               <none>                                      
bias         [8]                 VARIABLE            FLOAT               <none>                  [lstmLayer]         
input        [32, 6, 33]         PLACEHOLDER         FLOAT               <none>                  [lstmLayer]         
label        [32, 2, 33]         PLACEHOLDER         FLOAT               <none>                  [log_loss]          
loss         -                   ARRAY               FLOAT               log_loss(log_loss)                          
lstmLayer    [32, 2, 33]         ARRAY               FLOAT               lstmLayer(lstmLayer)    [permute]           
lstmLayer:1  [32, 2]             ARRAY               FLOAT               lstmLayer(lstmLayer)                        
matmul       [1056, 2]           ARRAY               FLOAT               matmul(matmul)          [reshape_1]         
out          -                   ARRAY               FLOAT               softmax(softmax)        [log_loss]          
permute      [32, 33, 2]         ARRAY               FLOAT               permute(permute)        [reshape]           
permute_1    [32, 2, 33]         ARRAY               FLOAT               permute_1(permute)      [softmax]           
rWeights     [2, 8]              VARIABLE            FLOAT               <none>                  [lstmLayer]         
reshape      [1056, 2]           ARRAY               FLOAT               reshape(reshape)        [matmul]            
reshape_1    [32, 33, 2]         ARRAY               FLOAT               reshape_1(reshape)      [permute_1]         
sd_var       []                  CONSTANT            FLOAT               <none>                  [log_loss]          
w1           [2, 2]              VARIABLE            FLOAT               <none>                  [matmul]            
weights      [6, 8]              VARIABLE            FLOAT               <none>                  [lstmLayer]         


--- Functions ---
     - Function Name -  - Op -      - Inputs -                        - Outputs -               
0    lstmLayer          LSTMLayer   [input, weights, rWeights, bias]  [lstmLayer, lstmLayer:1]  
1    permute            Permute     [lstmLayer]                       [permute]                 
2    reshape            Reshape     [permute]                         [reshape]                 
3    matmul             Mmul        [reshape, w1]                     [matmul]                  
4    reshape_1          Reshape     [matmul]                          [reshape_1]               
5    permute_1          Permute     [reshape_1]                       [permute_1]               
6    softmax            SoftMax     [permute_1]                       [out]                     
7    log_loss           LogLoss     [out, sd_var, label]              [loss]                    

Added differentiated op log_loss
Added differentiated op softmax
Added differentiated op permute_1
Added differentiated op reshape_1
Added differentiated op matmul
Added differentiated op reshape
Added differentiated op permute
Added differentiated op lstmLayer
.
.
.
Executing op: [lstmLayer]
About to get variable in  execute output
node_1:0 result shape: [32, 2, 43]; dtype: FLOAT; first values [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
About to get variable in  execute output
node_1:1 result shape: [32, 2]; dtype: FLOAT; first values [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
Debug info for node_2 input[0]; shape: [32, 2, 43]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
Removing variable <1:0>
Executing op: [permute]
About to get variable in  execute output
node_1:0 result shape: [32, 43, 2]; dtype: FLOAT; first values [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
Removing variable <1:0>
Executing op: [zeros_as]
About to get variable in  execute output
node_1:0 result shape: [32, 2]; dtype: FLOAT; first values [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Debug info for node_2 input[0]; shape: [32, 43, 2]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
Reshape: Optional reshape arg was -99
Removing variable <1:0>
Executing op: [reshape]
Reshape: Optional reshape arg was -99
Reshape: new shape: {1376, 2}
About to get variable in  execute output
node_1:0 result shape: [1376, 2]; dtype: FLOAT; first values [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
Removing variable <1:0>
Executing op: [shape_of]
About to get variable in  execute output
node_1:0 result shape: [3]; dtype: INT64; first values [32, 43, 2]
Removing variable <1:0>
Removing variable <1:1>
Executing op: [matmul]
About to get variable in  execute output
node_1:0 result shape: [1376, 2]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Debug info for node_2 input[0]; shape: [1376, 2]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Reshape: Optional reshape arg was -99
Removing variable <1:0>
Executing op: [reshape]
Reshape: Optional reshape arg was -99
Reshape: new shape: {32, 43, 2}
About to get variable in  execute output
node_1:0 result shape: [32, 43, 2]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Removing variable <1:0>
Executing op: [shape_of]
About to get variable in  execute output
node_1:0 result shape: [2]; dtype: INT64; first values [1376, 2]
Debug info for node_2 input[0]; shape: [32, 43, 2]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Removing variable <1:0>
Executing op: [permute]
About to get variable in  execute output
node_1:0 result shape: [32, 2, 43]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Removing variable <1:0>
Executing op: [softmax]
About to get variable in  execute output
node_1:0 result shape: [32, 2, 43]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Removing variable <1:0>
Removing variable <1:1>
Removing variable <1:2>
Executing op: [log_loss_grad]
About to get variable in  execute output
node_1:0 result shape: [32, 2, 43]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
About to get variable in  execute output
node_1:1 result shape: []; dtype: FLOAT; first values [-nan]
About to get variable in  execute output
node_1:2 result shape: [32, 2, 43]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Removing variable <1:0>
Removing variable <1:1>
Executing op: [softmax_bp]
About to get variable in  execute output
node_1:0 result shape: [32, 2, 43]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Debug info for node_2 input[0]; shape: [32, 2, 43]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Removing variable <1:0>
Executing op: [permute]
About to get variable in  execute output
node_1:0 result shape: [32, 43, 2]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Debug info for node_2 input[0]; shape: [32, 43, 2]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Debug info for node_2 input[1]; shape: [2]; ews: [1]; order: [c]; dtype: [INT64]; first values: [1376, 2]
Reshape: Optional reshape arg was 1376
Removing variable <1:0>
Removing variable <1:1>
Executing op: [reshape]
Reshape: Optional reshape arg was 1376
Reshape: new shape: {1376, 2}
About to get variable in  execute output
node_1:0 result shape: [1376, 2]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Removing variable <1:0>
Removing variable <1:1>
Removing variable <1:2>
Executing op: [matmul_bp]
Executing op: [matmul]
About to get variable in  execute output
node_1:0 result shape: [1376, 2]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Executing op: [matmul]
About to get variable in  execute output
node_1:0 result shape: [2, 2]; dtype: FLOAT; first values [nan, nan, nan, nan]
About to get variable in  execute output
node_1:0 result shape: [1376, 2]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
About to get variable in  execute output
node_1:1 result shape: [2, 2]; dtype: FLOAT; first values [nan, nan, nan, nan]
Executing op: [adam_updater]
About to get variable in  execute output
node_1:0 result shape: [2, 2]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan]
About to get variable in  execute output
node_1:1 result shape: [2, 2]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan]
About to get variable in  execute output
node_1:2 result shape: [2, 2]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan]
Executing op: [subtract]
About to get variable in  execute output
node_1:0 result shape: [2, 2]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan]
Debug info for node_2 input[0]; shape: [1376, 2]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Debug info for node_2 input[1]; shape: [3]; ews: [1]; order: [c]; dtype: [INT64]; first values: [32, 43, 2]
Reshape: Optional reshape arg was 32
Removing variable <1:0>
Removing variable <1:1>
Executing op: [reshape]
Reshape: Optional reshape arg was 32
Reshape: new shape: {32, 43, 2}
About to get variable in  execute output
node_1:0 result shape: [32, 43, 2]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Debug info for node_2 input[0]; shape: [32, 43, 2]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Removing variable <1:0>
Executing op: [permute]
About to get variable in  execute output
node_1:0 result shape: [32, 2, 43]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Debug info for node_2 input[0]; shape: [32, 6, 43]; ews: [1]; order: [f]; dtype: [FLOAT]; first values: [-0.226728, -0.226728, -0.129559, -0.097169, -0.161948, -0.161948, -0.161948, -0.161948, -0.161948, -0.161948, -0.097169, -0, -0, -0, -0, -0]
Debug info for node_2 input[1]; shape: [6, 8]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Debug info for node_2 input[2]; shape: [2, 8]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Debug info for node_2 input[3]; shape: [8]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Removing variable <1:0>
Removing variable <1:1>
Removing variable <1:2>
Removing variable <1:3>
Removing variable <1:4>
Removing variable <1:5>
Executing op: [lstmLayer_bp]
About to get variable in  execute output
node_1:0 result shape: [32, 6, 43]; dtype: FLOAT; first values [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
About to get variable in  execute output
node_1:1 result shape: [6, 8]; dtype: FLOAT; first values [-nan, -nan, nan, nan, nan, nan, nan, nan, -nan, -nan, nan, nan, nan, nan, nan, nan, -nan, -nan, nan, nan, nan, nan, nan, nan, -nan, -nan, nan, nan, nan, nan, nan, nan]
About to get variable in  execute output
node_1:2 result shape: [2, 8]; dtype: FLOAT; first values [-nan, -nan, nan, nan, nan, nan, nan, nan, -nan, -nan, nan, nan, nan, nan, nan, nan]
About to get variable in  execute output
node_1:3 result shape: [8]; dtype: FLOAT; first values [-nan, -nan, nan, nan, nan, nan, nan, nan]
Executing op: [adam_updater]
About to get variable in  execute output
node_1:0 result shape: [6, 8]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
About to get variable in  execute output
node_1:1 result shape: [6, 8]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
About to get variable in  execute output
node_1:2 result shape: [6, 8]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Executing op: [subtract]
About to get variable in  execute output
node_1:0 result shape: [6, 8]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Executing op: [adam_updater]
About to get variable in  execute output
node_1:0 result shape: [2, 8]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
About to get variable in  execute output
node_1:1 result shape: [2, 8]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
About to get variable in  execute output
node_1:2 result shape: [2, 8]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Executing op: [subtract]
About to get variable in  execute output
node_1:0 result shape: [2, 8]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Executing op: [adam_updater]
About to get variable in  execute output
node_1:0 result shape: [8]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
About to get variable in  execute output
node_1:1 result shape: [8]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
About to get variable in  execute output
node_1:2 result shape: [8]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Executing op: [subtract]
About to get variable in  execute output
node_1:0 result shape: [8]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Debug info for node_2 input[0]; shape: [22, 6, 19]; ews: [1]; order: [f]; dtype: [FLOAT]; first values: [-0.0647794, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0, -0]
Debug info for node_2 input[1]; shape: [6, 8]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Debug info for node_2 input[2]; shape: [2, 8]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Removing variable <1:0>
Removing variable <1:1>
Removing variable <1:2>
Removing variable <1:3>
Executing op: [lstmLayer]
About to get variable in  execute output
node_1:0 result shape: [22, 2, 19]; dtype: FLOAT; first values [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
About to get variable in  execute output
node_1:1 result shape: [22, 2]; dtype: FLOAT; first values [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
Debug info for node_2 input[0]; shape: [22, 2, 19]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
Removing variable <1:0>
Executing op: [permute]
About to get variable in  execute output
node_1:0 result shape: [22, 19, 2]; dtype: FLOAT; first values [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
Removing variable <1:0>
Executing op: [zeros_as]
About to get variable in  execute output
node_1:0 result shape: [22, 2]; dtype: FLOAT; first values [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Debug info for node_2 input[0]; shape: [22, 19, 2]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
Reshape: Optional reshape arg was -99
Removing variable <1:0>
Executing op: [reshape]
Reshape: Optional reshape arg was -99
Reshape: new shape: {418, 2}
About to get variable in  execute output
node_1:0 result shape: [418, 2]; dtype: FLOAT; first values [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
Removing variable <1:0>
Executing op: [shape_of]
About to get variable in  execute output
node_1:0 result shape: [3]; dtype: INT64; first values [22, 19, 2]
Removing variable <1:0>
Removing variable <1:1>
Executing op: [matmul]
About to get variable in  execute output
node_1:0 result shape: [418, 2]; dtype: FLOAT; first values [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Debug info for node_2 input[0]; shape: [418, 2]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan]
Reshape: Optional reshape arg was -99
Error at [/home/runner/work/deeplearning4j/deeplearning4j/libnd4j/include/ops/declarable/generic/shape/reshape.cpp:163:0]:
Reshape: lengths before and after reshape should match, but got 836 vs 832
Removing variable <1:0>
Exception in thread "main" java.lang.RuntimeException: Op reshape with name reshape_1 failed to execute. Here is the error from c++: Op validation failed
	at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.calculateOutputShape(NativeOpExecutioner.java:1672)
	at org.nd4j.linalg.api.ops.DynamicCustomOp.calculateOutputShape(DynamicCustomOp.java:696)
	at org.nd4j.autodiff.samediff.internal.InferenceSession.getAndParameterizeOp(InferenceSession.java:1363)
	at org.nd4j.autodiff.samediff.internal.InferenceSession.getAndParameterizeOp(InferenceSession.java:68)
	at org.nd4j.autodiff.samediff.internal.AbstractSession.output(AbstractSession.java:531)
	at org.nd4j.autodiff.samediff.internal.AbstractSession.output(AbstractSession.java:154)
	at org.nd4j.autodiff.samediff.internal.TrainingSession.trainingIteration(TrainingSession.java:129)
	at org.nd4j.autodiff.samediff.SameDiff.fitHelper(SameDiff.java:1936)
	at org.nd4j.autodiff.samediff.SameDiff.fit(SameDiff.java:1792)
	at org.nd4j.autodiff.samediff.SameDiff.fit(SameDiff.java:1732)
	at org.nd4j.autodiff.samediff.config.FitConfig.exec(FitConfig.java:172)
	at org.nd4j.autodiff.samediff.SameDiff.fit(SameDiff.java:1712)
	at org.deeplearning4j.examples.quickstart.modeling.recurrent.LocationNextNeuralNetworkV6_03.sameDiff3(LocationNextNeuralNetworkV6_03.java:240)
	at org.deeplearning4j.examples.quickstart.modeling.recurrent.LocationNextNeuralNetworkV6_03.main(LocationNextNeuralNetworkV6_03.java:141)

    private static int nIn = 6;
    private static int nOut = 2;
    private static int labelCount = 2;

    private static int miniBatchSize = 32;
    private static int numLabelClasses = -1;

    private static SameDiff sd = SameDiff.create();

    private static long dim0 = 0L;
    private static long  dim1 = 0L;
    private static long dim2 = 0L;

    private static Map<String,INDArray> placeholderData = new HashMap<>();

    private static DataSet t;

    public static void sameDiff3() throws IOException, InterruptedException
    {

        // ----- Load the training data -----
        trainFeatures = new CSVSequenceRecordReader();
        trainFeatures.initialize(new NumberedFileInputSplit(featuresDirTrain.getAbsolutePath() + "/%d.csv", 0, lastTrainCount));
        trainLabels = new CSVSequenceRecordReader();
        trainLabels.initialize(new NumberedFileInputSplit(labelsDirTrain.getAbsolutePath() + "/%d.csv", 0, lastTrainCount));

        trainData = new SequenceRecordReaderDataSetIterator(trainFeatures, trainLabels, miniBatchSize, numLabelClasses,
                true, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);

        // ----- Load the test data -----
        //Same process as for the training data.
        testFeatures = new CSVSequenceRecordReader();
        testFeatures.initialize(new NumberedFileInputSplit(featuresDirTest.getAbsolutePath() + "/%d.csv", 0, lastTestCount));
        testLabels = new CSVSequenceRecordReader();
        testLabels.initialize(new NumberedFileInputSplit(labelsDirTest.getAbsolutePath() + "/%d.csv", 0, lastTestCount));

        testData = new SequenceRecordReaderDataSetIterator(testFeatures, testLabels, miniBatchSize, numLabelClasses,
                true, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);

        normalizer = new NormalizerStandardize();
        normalizer.fitLabel(true);
        normalizer.fit(trainData);           //Collect the statistics (mean/stdev) from the training data. This does not modify the input data
        trainData.reset();

        while(trainData.hasNext()) {
            normalizer.transform(trainData.next());     //Apply normalization to the training data
        }

        while(testData.hasNext()) {
            normalizer.transform(testData.next());         //Apply normalization to the test data. This is using statistics calculated from the *training* set
        }

        trainData.reset();
        testData.reset();

        trainData.setPreProcessor(normalizer);
        testData.setPreProcessor(normalizer);

        System.out.println(" Printing traindata dataset shape - 1");
        DataSet data = trainData.next();
        System.out.println(Arrays.toString(data.getFeatures().shape()));

        System.out.println(" Printing testdata dataset shape - 1");
        DataSet data2 = testData.next();
        System.out.println(Arrays.toString(data2.getFeatures().shape()));

        trainData.reset();
        testData.reset();

        UIServer uiServer = UIServer.getInstance();
        StatsStorage statsStorage = new InMemoryStatsStorage();
        uiServer.attach(statsStorage);
        int listenerFrequency = 1;
        sd.setListeners(new ScoreListener());

        t = trainData.next();
        dim0 = t.getFeatures().size(0);
        dim1 = t.getFeatures().size(1);
        dim2 = t.getFeatures().size(2);
        System.out.println(" features - dim0 - 0 - "+dim0);
        System.out.println(" features - dim1 - 0 - "+dim1);
        System.out.println(" features - dim2 - 0 - "+dim2);
        trainData.reset();

        getConfiguration();


        int nEpochs = 4;
        for (int i = 0; i < nEpochs; i++) {
            Log.info("Epoch " + i + " starting. ");
            History history = sd.fit(trainData, 1);
            trainData.reset();
            Log.info("Epoch " + i + " completed. ");
        }

        System.out.println(" Starting test data evaluation --- ");

        //Evaluate on test set:
        String outputVariable = "out";
        Evaluation evaluation = new Evaluation();
        sd.evaluate(testData, outputVariable, evaluation);

        //Print evaluation statistics:
        System.out.println(" evaluation.stats() - "+evaluation.stats());


        String pathToSavedNetwork = "src/main/assets/location_next_neural_network_v6_03.zip";
        File savedNetwork = new File(pathToSavedNetwork);

        sd.save(savedNetwork, true);
//        ModelSerializer.addNormalizerToModel(savedNetwork, normalizer);

        System.out.println("----- Example Complete -----");

        //Save the trained network for inference - FlatBuffers format
        File saveFileForInference = new File("src/main/assets/sameDiffExampleInference.fb");

        try {
            sd.asFlatFile(saveFileForInference);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }

    }

    private static void getConfiguration()
    {

        placeholderData = placeholderData = new HashMap<>();

        //Create input and label variables
        SDVariable input = sd.placeHolder("input", DataType.FLOAT, miniBatchSize, nIn, -1);
        SDVariable label = sd.placeHolder("label", DataType.FLOAT, miniBatchSize, nOut, -1);

        placeholderData.put("input",  t.getFeatures());
        placeholderData.put("label", t.getLabels());

        LSTMLayerConfig mLSTMConfiguration = LSTMLayerConfig.builder()
                .lstmdataformat(LSTMDataFormat.NST)
                .directionMode(LSTMDirectionMode.FWD)
                .gateAct(LSTMActivations.SIGMOID)
                .cellAct(LSTMActivations.SOFTPLUS)
                .outAct(LSTMActivations.SOFTPLUS)
                .retFullSequence(true)
                .retLastC(false)
                .retLastH(true)
                .build();

        LSTMLayerOutputs outputs = new LSTMLayerOutputs(sd.rnn.lstmLayer(
                input,
                LSTMLayerWeights.builder()
                        .weights(sd.var("weights", Nd4j.rand(DataType.FLOAT, nIn, 4 * nOut)))
                        .rWeights(sd.var("rWeights", Nd4j.rand(DataType.FLOAT, nOut, 4 * nOut)))
                        .bias(sd.var("bias", Nd4j.rand(DataType.FLOAT, 4 * nOut)))
                        .build(),
                mLSTMConfiguration), mLSTMConfiguration);

        // t.getFeatures().size(0) == input.getShape()[0] == miniBatchSize
        // t.getFeatures().size(1) == input.getShape()[1] == nIn
        // t.getFeatures().size(2) == input.getShape()[2] == TimeSteps

        SDVariable layer0 = outputs.getOutput();

        SDVariable layer0Permuted = sd.permute(layer0, 0, 2, 1);
        
        SDVariable layer0PermutedReshaped = sd.reshape(layer0Permuted, -1, nOut);

        SDVariable w1 = sd.var("w1", new XavierInitScheme('c', nIn, nOut), DataType.FLOAT, nOut, labelCount);
        SDVariable b1 = sd.var("b1", Nd4j.rand(DataType.FLOAT, labelCount));

        SDVariable mmulOutput = layer0PermutedReshaped.mmul(w1);

        SDVariable mmulOutputUnreshaped = sd.reshape(mmulOutput, miniBatchSize, -1, labelCount);

        SDVariable mmulOutputUnreshapedUnPermuted = sd.permute(mmulOutputUnreshaped, 0, 2, 1);

        SDVariable out = sd.nn.softmax("out", mmulOutputUnreshapedUnPermuted);

        SDVariable loss = sd.loss.logLoss("loss", label, out);

        sd.setLossVariables("loss");

        double learningRate = 1e-3;
        TrainingConfig config = new TrainingConfig.Builder()
                .l2(1e-4)                               //L2 regularization
                .updater(new Adam(learningRate))        //Adam optimizer with specified learning rate
                .dataSetFeatureMapping("input")         //DataSet features array should be associated with variable "input"
                .dataSetLabelMapping("label")           //DataSet label array should be associated with variable "label"
                .build();

        sd.setTrainingConfig(config);

        System.out.println(" Printing sd information");
        System.out.println(sd.summary());

    }

I don’t know if this is helpful. The number of first values elements for nodes where dim0 is reported as 22 is 32

Executing op: [lstmLayer]
About to get variable in  execute output
node_1:0 result shape: [22, 2, 19]; dtype: FLOAT; first values [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
About to get variable in  execute output
node_1:1 result shape: [22, 2]; dtype: FLOAT; first values [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
Debug info for node_2 input[0]; shape: [22, 2, 19]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
Removing variable <1:0>
Executing op: [permute]
About to get variable in  execute output
node_1:0 result shape: [22, 19, 2]; dtype: FLOAT; first values [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
Removing variable <1:0>
Executing op: [zeros_as]
About to get variable in  execute output
node_1:0 result shape: [22, 2]; dtype: FLOAT; first values [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Debug info for node_2 input[0]; shape: [22, 19, 2]; ews: [1]; order: [c]; dtype: [FLOAT]; first values: [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
Reshape: Optional reshape arg was -99
Removing variable <1:0>
Executing op: [reshape]
Reshape: Optional reshape arg was -99
Reshape: new shape: {418, 2}
About to get variable in  execute output
node_1:0 result shape: [418, 2]; dtype: FLOAT; first values [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]

When your total number of examples is not cleanly divisible by the batch size, the last batch can end up smaller. That is normal, and nothing in your setup should depend on the batch size directly anyway. Instead you should use the sizeAt operation that gives you an SDVariable typed result.

With samediff you are setting up the structure of the computation, so you shouldn’t use fixed values unless they stay fixed for all iterations.

That can happen due to a variety of reasons, this includes too high learning rates, missing normalization of input data, inappropriate initialization, exploding gradients, etc…

What values do you see before it gets to NaN? Is there a trend?

As usual, samediff operations stump me. I tried using sizeAt. I tried to apply it to the input dataset, and ran into nullPointerException, as I expected since SD variables are definitions not instantiations.

I tried using the INDArray version of sizeAt. It did not cause nullPointerExceptions but failed with the same shape error.

However, I solved the problem in a way that does not involve samediff. I am pretty frustrated by my inability to use some of the most basic samediff operations. I would still like to use sizAt. I don’t know whether I am applying it to the wrong variable or at the wrong place in the code. There are only three places where I am using miniBatchSize , when I define input and label placeholders and when I reshape after matmul all of these instances in the getConfiguration() method.

Now, after fixing the shape problem without using samediff, execution fails with this error:

Cannot perform evaluation with NaNs present in predictions: 32 NaNs present in predictions INDArray

While as you pointed out in your last message:

I have five other networks created with dl4j that train and test well to completion with the exact same input data. And these networks work fairly well in inference mode in my Android app. This would lead me to believe that the issue does not lie with the input data.

The process of input data set-up preparation including normalization and pre-processing is exactly the same as the one I used when building the other five networks.

When building those networks I tested various combinations of learning rates and regularization. Learning rate of 1e-3 and regularization of 1e-4 should be OK.

I would appreciate a pointer for diagnosing the nan problem.

I did some more work trying to diagnose the -nan problem. It turns out that it is preceded (caused?) by -infproblem when executing matmul, specifically when I matmul
the lstmlayer output by a weights variable. Here is the line of code:

        SDVariable mmulOutput = layer0PermutedReshaped.mmul(w1).add(b1);

w1 and b1 are defined as follows:

        SDVariable w1 = sd.var("w1", new XavierInitScheme('c', nIn, nOut), DataType.FLOAT, nOut, labelCount);
        SDVariable b1 = sd.constant("b1", 0.05);

Please note that if I do not include

add(b1)

the problem still occurs. It really seems to happen when executing matmul.

What do you think may be happening? Any suggestions for what I should try next?

Thanks

inf happens due to a float overflow, so either the lstm outputs get too large, or something is up with w1.

Some progress. Changing activation for cells and out from SOFTPLUS to TANH resolved the inf and -nan problems

I don’t think it’s working quite yet. Below, you will find the evaluation output. What do you make of the evaluation information?

By the way, results visualization does not work for me.

sd.setListeners( new StatsListener(statsStorage, listenerFrequency));

does not work as StatsListener does not appear to be supported in SameDiff

and

sd.setListeners(new ScoreListener(listenerFrequency, true, true));

does not seem to do anything

EVALUATION OUTPUT

 evaluation.stats() - 

========================Evaluation Metrics========================
 # of classes:    2
 Accuracy:        0.6562
 Precision:       0.6375
 Recall:          0.9273
 F1 Score:        0.7556
Precision, recall & F1: reported for positive class (class 1 - "1") only


=========================Confusion Matrix=========================
  0  1
-------
 12 29 | 0 = 0
  4 51 | 1 = 1

Confusion matrix format: Actual (rowClass) predicted as (columnClass) N times
==================================================================

I changed evaluation to RegressionEvaluation The results are below. They look OK. What do you think?

When I try to add the normalizer to the saved network I get

java.util.zip.ZipException: zip END header not found

error.

I think it’s because adding a normalizer to a seved network is supported only if the network is saved using ModelSerializer which is not supported by SameDiff (is this true?).

In order to run inference from my Android app I need to be able to save the normalizer.

evaluation.stats() - Column    MSE            MAE            RMSE           RSE            PC             R^2            
col_0     2.28702e-01    3.67798e-01    4.78228e-01    1.01028e+00    -1.79478e-01   -1.02771e-02   
col_1     5.53711e-02    1.86274e-01    2.35311e-01    1.45110e+00    -1.02073e-01   -4.51098e-01   

@adonnini samediff and dl4j are different file formats. Samediff is a flatbuffers based format. You’d need to save them separately. They’re not zip files are you’re seeing from the error.

I see. OK. Then, could you please tell me how I add the normalizer to the saved network when using samediff?

Also, I assume that to restore a network and its normalizer, I will have to use a samediff method, correct?

When restoring a network and its normalizer using dl4j I use output and ‘revertLabels’ for inference.

Which are their equivalents in samediff?

Thanks

@adonnini you’d just save and load the normalizer separately and use it before you put anything in to a placeholder map in samediff.

Edit: to answer your other question there doesn’t need to be one. You use the same normalizers. If you’ve used .output at all you should know you get ndarrays out just like from a network. You’d just do the exact same thing and just use your existing normalizer.

Thanks. I had reached the same conclusion regarding the normalizer.

I am sorry about reporting this next error. I really feel as if I am missing something.

When I try to retrieve a saved network with SameDiff.fromFlatFile the result is a nullPointerException

I double checked that the file exists at the location I try to retrieve it from and that it is not empty.

When I run the code in debug mode, the debugger pinpoints the failure in File.java here:

else if (uri.isOpaque()) {
            throw new IllegalArgumentException("URI is not hierarchical");
        }

My URI is

        File saveFileForInference = new File("/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/deeplearning4j-examples-master_1/dl4j-examples/src/main/assets/location_next_neural_network_v6_07.fb");

Execution of exists(), .length(), and getAbsolutePath() for the file in questions all work. And, the failure occurs also if I use a relative path for the file path instead of the absolute path.

Is it possible that in handling the URI in the samediff code it becomes “opaque”?

What am I missing/doing wrong?

@adonnini Does the file exist? I’d need the full stack trace. fromFlatFile isn’t opaque. You can download the source for the library right in your IDE from maven central. All it does is loads the bytes from the file and then parses them using flatbuffers.

I’d need the full stack trace…I feel like if you spent some time on this you’d probably see the file was missing.

General suggestion for any and all java stuff: you’re not coding in c++. Segfaults and process crashes don’t happen unless it fails at at that level.

That isn’t the case here.You can take a second to track down a null pointer. Run it at the point of exception in your debugger. You can usually give a lot more information if you do that.