ND4JIllegalStateException

lavajaw · December 5, 2020, 12:43pm

On windows with beta7 I can train models with data type DOULE or FLOAT, but when I try to train with HALF this exception is thrown:
Caused by: org.nd4j.linalg.exception.ND4JIllegalStateException: Op [histogram] X argument uses leaked workspace pointer from workspace [WS_LAYER_WORKING_MEM]: Workspace the array was defined in is no longer open.

Also i saw that HALF is deprecated and FLOAT16 should be used, but when i try to train with that this happens:
IllegalStateException:Data type must be a floating point type: one of DOUBLE, FLOAT, or HALF. Got datatype: BFLOAT16

agibsonccc · December 6, 2020, 7:35am

@lavajaw can you give us something to reproduce this? FWIW, training models with half varies on intel chips and generally works on gpu but in general is a bit hit or miss. It’s not really recommended if you can help it. If you want half, I would recommend optimizing it for inference instead after training.

lavajaw · December 6, 2020, 7:01pm

This is my gradle:
implementation 'org.deeplearning4j:deeplearning4j-core:1.0.0-beta7'
implementation 'org.deeplearning4j:deeplearning4j-ui:1.0.0-beta7'
implementation 'org.deeplearning4j:deeplearning4j-zoo:1.0.0-beta7'
implementation 'ch.qos.logback:logback-classic:1.2.3'
implementation 'org.projectlombok:lombok:1.18.12'

//  GPU
implementation group: 'org.nd4j', name: 'nd4j-cuda-10.2-platform', version: '1.0.0-beta7'
implementation group: 'org.deeplearning4j', name: 'deeplearning4j-cuda-10.2', version: '1.0.0-beta7'

17:25:52.497 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Blas vendor: [CUBLAS]
17:25:52.516 [main] INFO org.nd4j.linalg.jcublas.JCublasBackend - ND4J CUDA build version: 10.2.89
17:25:52.517 [main] INFO org.nd4j.linalg.jcublas.JCublasBackend - CUDA device 0: [GeForce RTX 2070]; cc: [7.5]; Total memory: [8589934592]
17:25:53.187 [main] INFO org.deeplearning4j.nn.graph.ComputationGraph - Starting ComputationGraph with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]

Caused by: org.nd4j.linalg.exception.ND4JIllegalStateException: Op [histogram] X argument uses leaked workspace pointer from workspace [WS_LAYER_WORKING_MEM]: Workspace the array was defined in is no longer open.
All open workspaces:
Caused by: org.nd4j.linalg.exception.ND4JIllegalStateException: Op [histogram] X argument uses leaked workspace pointer from workspace [WS_LAYER_WORKING_MEM]: Workspace the array was defined in is no longer open.

Then i tried to add:
.trainingWorkspaceMode(WorkspaceMode.NONE)
.inferenceWorkspaceMode(WorkspaceMode.NONE)
and it didnt crash but all values at evaluations was NaNs

I wanted to try HALF, because model should be executed on android device, so I wanted to check is any improvement in execute speed.

“I would recommend optimizing it for inference instead after training”. I am not sure how to do this, do you have any links/example so I can learn more about this?

agibsonccc · December 7, 2020, 9:58am

@quickwritereader could you take a look at HALF execution? ^

quickwritereader · December 8, 2020, 2:17pm

@lavajaw
I talked to the team. I was testing a simple example. Depending on updater and other parameters training using Float16 could either give not optimal results or NANs. So I was advised to convert it after training. It reduced my model size and also gave almost the same results for evaluation.
You can use .convertDataType(DataType.FLOAT16) and save it.
Coming on performance on arm float16 promoted to float for computations. but Armv8.2-A and later arches will use hardware-based instructions for computations. So if your model is small enough I do not see any reasons to use it for now.
thanks

lavajaw · December 8, 2020, 4:10pm

@quickwritereader thanks.

Just to know, I tested my model, same that I provided you in DM.
Resaults:

NeuralNetwork neuralNetwork = ModelSerializer.restoreComputationGraph(path).convertDataType(DataType.HALF);
output = ((ComputationGraph) neuralNetwork).output(indArray.dup());

convertDataType(DataType.HALF) was 2 times slower than
.convertDataType(DataType.FLOAT) or .convertDataType(DataType.DOUBLE), which is very strange.

quickwritereader · December 8, 2020, 10:03pm

though it should be great because of cache and memory bandwidth it seems only having limited instruction that promotes float16 to float32 are not enough.
but on the latest chips, float16 and as well as bfloat16 will be faster and widely supported.

Topic		Replies	Views
HALF data type with GPU backend DL4J	1	378	March 3, 2020
Cannot perform gradient check: Datatype is not set to double precision DL4J	2	647	July 1, 2020
Workspace "WS_LAYER_WORKING_MEM" for array type FF_WORKING_MEM is not open ND4J	6	589	April 20, 2021
Float vs Double on Spark? DL4J	7	632	June 26, 2020
Cuda error during run the example DL4J	46	3072	June 2, 2021

ND4JIllegalStateException

Related topics