Releasing memory from Datavec Local?

Hello All,

I’ve set up a mini-server that is constantly performing inferences very quickly in real time, however I also need it to perform a transform on a small amount of data. The LocalTransformExecutor is quite a bit faster than Spark (Local), however I’m finding that it doesn’t release it’s memory. Running LocalTransformExecutor.execute() over and over I eventually get:
org.apache.arrow.memory.OutOfMemoryException: Failure allocating buffer.
at io.netty.buffer.PooledByteBufAllocatorL.allocate(
at org.apache.arrow.memory.AllocationManager.(

This is easily tested simply putting an execute in a loop. It won’t release until you exit the jvm. If I pause the data collection it won’t gc, and if put in a System.gc() it does nothing. I’ve tried many of the Nd4j static memory management commands to try to release the allocations, purgeCache etc, even tried to set up workspaces. I know that workspaces has been implemented and fixed similar problems with train/inferencing, but I don’t think it’s set up on Local Datavec. From what I can tell, it seems to be arrow holding onto references, but I’m not sure.

Does any know any simple way to get it to release it’s memory, or am I waiting for fix here (or trying to fix it myself)?

Thanks in advance.

Could you provide a small code snippet that exhibits this behavior? This sounds like a bug though, in which case we may want to track this as a GitHub issue.

I tested with a very cut down version of org.datavec.transform.basic.BasicDataVecExampleLocal.

Modified to a simple transform:

    TransformProcess tp = new TransformProcess.Builder(inputDataSchema)
    		.addConstantDoubleColumn("Test-Column", 20)

and then loop the .execute:

    List<List<Writable>> processedData = null;
    for (int i=0;i<100000;i++) {
    	processedData = LocalTransformExecutor.execute(originalData, tp);
    	/* Any memory technique I could think of here */

still doesn’t let go of the memory until it exits.

Thanks for the reply.

That does indeed look like a bug in DataVec. @AlexBlack is probably the one that knows that code best, but in any case, please open an issue on GitHub for this. Thanks!